explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.34k stars 4.41k forks source link

error in reduce_last when using spancat with ALBERT xlarge #11905

Closed egumasa closed 1 year ago

egumasa commented 2 years ago

Hello,

I was not sure if the following is an error in the source code, but I get following error when I train spacy spancat model with ALBERT xlarge. (the weird thing is that it seems working with RoBERTa model without an issue, so I was not sure this is issue for thinc).

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'tagger', 'parser', 'ner',
'trainable_transformer', 'span_finder', 'spancat']
ℹ Frozen components: ['transformer', 'parser', 'tagger', 'ner']
ℹ Set annotations on update for: ['span_finder']
ℹ Initial learn rate: 0.0
E    #       LOSS TRAIN...  LOSS SPAN_...  LOSS SPANCAT  SPAN_FINDE...  SPAN_FINDE...  SPAN_FINDE...  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  -------------  -------------  ------------  -------------  -------------  -------------  ----------  ----------  ----------  ------
  0       0        8344.77          18.89       1601.06           0.46           0.23          72.91        0.08        0.04       23.38    0.15
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Epoch 1:  99%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎  | 99/100 [01:56<00:00,  1.12it/s]⚠ Aborting and saving the final best model. Encountered exception:
ValueError('all sequence lengths must be >= 0')
Traceback (most recent call last):
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/cli/_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/typer/main.py", line 532, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/cli/train.py", line 45, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/cli/train.py", line 75, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/training/loop.py", line 122, in train
    raise e
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/training/loop.py", line 105, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/training/loop.py", line 226, in train_while_improving
    score, other_scores = evaluate()
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/training/loop.py", line 281, in evaluate
    scores = nlp.evaluate(dev_corpus(nlp))
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/language.py", line 1430, in evaluate
    for eg, doc in zip(examples, docs):
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/language.py", line 1589, in pipe
    for doc in docs:
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/util.py", line 1651, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/trainable_pipe.pyx", line 79, in pipe
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/util.py", line 1670, in raise_error
    raise e
  File "spacy/pipeline/trainable_pipe.pyx", line 75, in spacy.pipeline.trainable_pipe.TrainablePipe.pipe
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/spacy/pipeline/spancat.py", line 275, in predict
    scores = self.model.predict((docs, indices))  # type: ignore
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/layers/chain.py", line 55, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/layers/chain.py", line 55, in forward
    Y, inc_layer_grad = layer(X, is_train=is_train)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/layers/reduce_last.py", line 21, in forward
    Y, lasts = model.ops.reduce_last(cast(Floats2d, Xr.data), Xr.lengths)
  File "/Users/masakieguchi/opt/miniforge3/envs/spancat/lib/python3.9/site-packages/thinc/backends/ops.py", line 1204, in reduce_last
    raise ValueError(f"all sequence lengths must be >= 0")
ValueError: all sequence lengths must be >= 0

Here is my training environment:

# Name                    Version                   Build  Channel
altair                    4.2.0                    pypi_0    pypi
attrs                     22.1.0                   pypi_0    pypi
blinker                   1.5                      pypi_0    pypi
blis                      0.7.9                    pypi_0    pypi
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2022.9.24            h4653dfc_0    conda-forge
cachetools                5.2.0                    pypi_0    pypi
catalogue                 2.0.8                    pypi_0    pypi
certifi                   2022.9.24                pypi_0    pypi
charset-normalizer        2.1.1                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
commonmark                0.9.1                    pypi_0    pypi
confection                0.0.3                    pypi_0    pypi
cymem                     2.0.7                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
en-core-web-trf           3.4.1                    pypi_0    pypi
entrypoints               0.4                      pypi_0    pypi
filelock                  3.8.0                    pypi_0    pypi
gitdb                     4.0.10                   pypi_0    pypi
gitpython                 3.1.29                   pypi_0    pypi
huggingface-hub           0.8.1                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
importlib-metadata        5.1.0                    pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
jsonschema                4.17.1                   pypi_0    pypi
langcodes                 3.3.0                    pypi_0    pypi
libffi                    3.4.2                h3422bc3_5    conda-forge
libsqlite                 3.40.0               h76d750c_0    conda-forge
libzlib                   1.2.13               h03a7124_4    conda-forge
markupsafe                2.1.1                    pypi_0    pypi
murmurhash                1.0.9                    pypi_0    pypi
ncurses                   6.3                  h07bb92c_1    conda-forge
numpy                     1.23.5                   pypi_0    pypi
openssl                   3.0.7                h03a7124_0    conda-forge
packaging                 21.3                     pypi_0    pypi
pandas                    1.5.2                    pypi_0    pypi
pathy                     0.10.0                   pypi_0    pypi
pillow                    9.3.0                    pypi_0    pypi
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
preshed                   3.0.8                    pypi_0    pypi
protobuf                  3.20.3                   pypi_0    pypi
pyarrow                   10.0.1                   pypi_0    pypi
pydantic                  1.10.2                   pypi_0    pypi
pydeck                    0.8.0                    pypi_0    pypi
pygments                  2.13.0                   pypi_0    pypi
pympler                   1.0.1                    pypi_0    pypi
pyparsing                 3.0.9                    pypi_0    pypi
pyrsistent                0.19.2                   pypi_0    pypi
python                    3.9.15          hea58f1e_0_cpython    conda-forge
python-dateutil           2.8.2                    pypi_0    pypi
pytz                      2022.6                   pypi_0    pypi
pytz-deprecation-shim     0.1.0.post0              pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.1.2                h46ed386_0    conda-forge
regex                     2022.10.31               pypi_0    pypi
requests                  2.28.1                   pypi_0    pypi
rich                      12.6.0                   pypi_0    pypi
semver                    2.13.0                   pypi_0    pypi
setuptools                65.6.3                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
smart-open                5.2.1                    pypi_0    pypi
smmap                     5.0.0                    pypi_0    pypi
spacy                     3.4.3                    pypi_0    pypi
spacy-alignments          0.8.6                    pypi_0    pypi
spacy-experimental        0.6.0                    pypi_0    pypi
spacy-huggingface-hub     0.0.7                    pypi_0    pypi
spacy-legacy              3.0.10                   pypi_0    pypi
spacy-loggers             1.0.3                    pypi_0    pypi
spacy-transformers        1.1.8                    pypi_0    pypi
srsly                     2.4.5                    pypi_0    pypi
streamlit                 1.15.1                   pypi_0    pypi
thinc                     8.1.5                    pypi_0    pypi
thinc-apple-ops           0.1.2                    pypi_0    pypi
tk                        8.6.12               he1e0b03_0    conda-forge
tokenizers                0.12.1                   pypi_0    pypi
toml                      0.10.2                   pypi_0    pypi
toolz                     0.12.0                   pypi_0    pypi
torch                     1.11.0                   pypi_0    pypi
tornado                   6.2                      pypi_0    pypi
tqdm                      4.64.1                   pypi_0    pypi
transformers              4.21.3                   pypi_0    pypi
typer                     0.4.2                    pypi_0    pypi
typing-extensions         4.4.0                    pypi_0    pypi
tzdata                    2022.6                   pypi_0    pypi
tzlocal                   4.2                      pypi_0    pypi
urllib3                   1.26.13                  pypi_0    pypi
validators                0.20.0                   pypi_0    pypi
wasabi                    0.10.1                   pypi_0    pypi
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h57fd34a_0    conda-forge
zipp                      3.10.0                   pypi_0    pypi

I use miniforge3, on M1 max MBP.

polm commented 1 year ago

Sorry you're having trouble with this. This looks like more of a spaCy issue, let me transfer it...

polm commented 1 year ago

Can you share your config? In particular it'd help to check exactly which version of ALBERT xlarge you're using.

Also, were you using the same version of spaCy when it worked? This looks like it might be the same as #11861. In that issue, the underlying problem is that a suggester that sometimes provides docs without spans isn't handled correctly. In 3.3 the code will run (while probably doing the wrong thing), but will give the same error you got in 3.4.

egumasa commented 1 year ago

Hi polm,

Thank you for transferring the post.

Indeed, I think at some point, I switched to version 3.4. I downgraded to 3.3 and indeed, it does not throw an error. Thank you!

But then now I still have the issue of zero score like below. In models other than RoBERTa-base, I get zeros early in the training and it does not come back.

E    #       LOSS TRAIN...  LOSS SPAN_...  LOSS SPANCAT  SPAN_FINDE...  SPAN_FINDE...  SPAN_FINDE...  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  -------------  -------------  ------------  -------------  -------------  -------------  ----------  ----------  ----------  ------ 
  0       0       99307.98          34.13      11957.44           0.45           0.22          64.09        0.02        0.01        6.40    0.12 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s 
  0     100     5905736.42        2463.12     196776.72           5.55           3.20          20.99        0.00        0.00        0.00    0.04 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... 
  0     200          64.23        1534.61       1172.05           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
  0     300          28.45        1640.76        921.27           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s 
  0     400          24.85        1598.82        907.55           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... 
  1     500      216767.50        1776.50      72094.24           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... 
  1     600          15.51        1534.36        851.89           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... 
  1     700          15.90        1540.49        862.97           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
  1     800          13.65        1636.51        919.89           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s 
  2     900          10.47        1593.23        893.89           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s 
  2    1000           9.92        1619.86        923.88           0.00           0.00           0.00        0.00        0.00        0.00    0.00 
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... 

Here are my configs: I have RoBERTa-base (worked), RoBERTa-large, ALBERT-base, ALBERT-large, and ALBERT-xl.

Working:

Not working (scores return zero early in the training and do not come back):

Now that it runs without an error, I wonder this is related to learning rate. Applying the same learning rate to large model is not a good idea?

RoBERTa (worked; showing in full)

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
source = "en_core_web_trf"

[vars]
spans_key = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer", "tagger", "parser", "ner", "trainable_transformer", "span_finder", "spancat"]
batch_size = 64
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.transformer]
source = "en_core_web_trf"

[components.tagger]
source = ${paths.source}
#upstream = "*"
# replace_listeners = ["model.transformer"]

[components.tagger.model]
@architectures = "spacy.Tagger.v2"
nO = null
normalize = false

[components.tagger.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}

[components.parser]
source = ${paths.source}

[components.parser.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "parser"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.parser.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}

[components.ner]
source = ${paths.source}

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}

[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"

[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147

[components.trainable_transformer.model.tokenizer_config]
use_fast = true

[components.span_finder]
factory = "experimental_span_finder"
threshold = 0.2
predicted_key = "span_candidates"
training_key = ${vars.spans_key}
min_length = 0
max_length = 0

[components.span_finder.scorer]
@scorers = "spacy-experimental.span_finder_scorer.v1"
predicted_key = ${components.span_finder.predicted_key}
training_key = ${vars.spans_key}

[components.span_finder.model]
@architectures = "spacy-experimental.SpanFinder.v1"

[components.span_finder.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO=2

[components.span_finder.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"

[components.span_finder.model.tok2vec.pooling]
@layers = "reduce_mean.v1"

[components.spancat]
factory = "spancat"
max_positive = 2
spans_key = ${vars.spans_key}
threshold = 0.5

[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"

[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 256

[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null

[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"

[components.spancat.model.tok2vec.pooling]
@layers = "reduce_mean.v1"

[components.spancat.suggester]
@misc = "spacy-experimental.span_finder_suggester.v1"
candidates_key = ${components.span_finder.predicted_key}

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
# patience = 200
patience = 1500
max_epochs = 0
max_steps = 20000
# max_steps = 1000
eval_frequency = 100
frozen_components = ["transformer", "parser", "tagger", "ner"]
annotating_components = ["span_finder"]
before_to_disk = null

# [training.batcher]
# @batchers = "spacy.batch_by_sequence.v1"
# get_length = null

# [training.batcher.size]
# @schedules = "compounding.v1"
# start = 32
# stop = 256
# compound = 1.001

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 200
stop = 800
compound = 1.0005

# [training.logger]
# @loggers = "spacy.ConsoleLogger.v1"
# progress_bar = true

[training.logger]
@loggers = "spacy.WandbLogger.v3"
project_name = "spnacat_engagementv2"
remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
# log_dataset_dir = "./corpus"
model_log_interval = 100
entity="********"
run_name = "NoMONOGLOSS_S_20221030"

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
span_finder_span_candidates_f = 0.0
span_finder_span_candidates_p = 0.0
span_finder_span_candidates_r = 0.2
spans_sc_p = 0.1
spans_sc_r = 0.1
spans_sc_f = 0.7
dep_las_per_type = null
sents_p = null
sents_r = null
ents_per_type = null
tag_acc = null
dep_uas = null
dep_las = null
sents_f = null
ents_f = null
ents_p = null
ents_r = null
lemma_acc = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

RoBERTa-large (score becomes zero early in the training)

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
source = "en_core_web_trf"

[vars]
spans_key = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer", "tagger", "parser", "ner", "trainable_transformer", "span_finder", "spancat"]
batch_size = 32
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

=== Omitted for brevity but basically the same as RoBERTa-base===

[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-large"

[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147

[components.trainable_transformer.model.tokenizer_config]
use_fast = true

[components.span_finder]
factory = "experimental_span_finder"
threshold = 0.2
predicted_key = "span_candidates"
training_key = ${vars.spans_key}
min_length = 0
max_length = 0

[components.span_finder.scorer]
@scorers = "spacy-experimental.span_finder_scorer.v1"
predicted_key = ${components.span_finder.predicted_key}
training_key = ${vars.spans_key}

[components.span_finder.model]
@architectures = "spacy-experimental.SpanFinder.v1"

[components.span_finder.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO=2

[components.span_finder.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"

[components.span_finder.model.tok2vec.pooling]
@layers = "reduce_mean.v1"

[components.spancat]
factory = "spancat"
max_positive = 2
spans_key = ${vars.spans_key}
threshold = 0.5

[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"

[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 256

[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null

[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"

[components.spancat.model.tok2vec.pooling]
@layers = "reduce_mean.v1"

[components.spancat.suggester]
@misc = "spacy-experimental.span_finder_suggester.v1"
candidates_key = ${components.span_finder.predicted_key}

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1200
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = ["transformer", "parser", "tagger", "ner"]
annotating_components = ["span_finder"]
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 400
compound = 1.0002

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = true

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.0001

[training.score_weights]
span_finder_span_candidates_f = 0.0
span_finder_span_candidates_p = 0.0
span_finder_span_candidates_r = 0.2
spans_sc_p = 0.1
spans_sc_r = 0.1
spans_sc_f = 0.7
dep_las_per_type = null
sents_p = null
sents_r = null
ents_per_type = null
tag_acc = null
dep_uas = null
dep_las = null
sents_f = null
ents_f = null
ents_p = null
ents_r = null
lemma_acc = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

ALBERT-base (worked)

[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "albert-base-v2"

[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147

[components.trainable_transformer.model.tokenizer_config]
use_fast = true

ALBERT large (prediction score zero early in the training and does not come back)

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}

[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "albert-large-v2"

[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147

[components.trainable_transformer.model.tokenizer_config]
use_fast = true

ALBERT-xlarge


[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}

[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "albert-xlarge-v2"

[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147

[components.trainable_transformer.model.tokenizer_config]
use_fast = true

For these models, the batch settings and Adam settings are constant.

Thank you so much for your help in advance.

polm commented 1 year ago

Sorry for not being more clear, but as mentioned in this thread I think that while the code runs without error in 3.3 it may not be doing the right thing. For the models that "work", I would check that their outputs make sense.

Setting the right learning rate can be tricky, especially with large models, but if you get it wrong you usually won't have flat zeroes as a result.

polm commented 1 year ago

This should be resolved by #11860 if you want to try building from master.

egumasa commented 1 year ago

Thank you so much for keeping me in the loop! I will be testing later this week. Thank you so much.

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it was answered and there was no follow-up discussion.

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.