Closed egumasa closed 1 year ago
Sorry you're having trouble with this. This looks like more of a spaCy issue, let me transfer it...
Can you share your config? In particular it'd help to check exactly which version of ALBERT xlarge you're using.
Also, were you using the same version of spaCy when it worked? This looks like it might be the same as #11861. In that issue, the underlying problem is that a suggester that sometimes provides docs without spans isn't handled correctly. In 3.3 the code will run (while probably doing the wrong thing), but will give the same error you got in 3.4.
Hi polm,
Thank you for transferring the post.
Indeed, I think at some point, I switched to version 3.4
. I downgraded to 3.3 and indeed, it does not throw an error. Thank you!
But then now I still have the issue of zero score like below. In models other than RoBERTa-base, I get zeros early in the training and it does not come back.
E # LOSS TRAIN... LOSS SPAN_... LOSS SPANCAT SPAN_FINDE... SPAN_FINDE... SPAN_FINDE... SPANS_SC_F SPANS_SC_P SPANS_SC_R SCORE
--- ------ ------------- ------------- ------------ ------------- ------------- ------------- ---------- ---------- ---------- ------
0 0 99307.98 34.13 11957.44 0.45 0.22 64.09 0.02 0.01 6.40 0.12
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s
0 100 5905736.42 2463.12 196776.72 5.55 3.20 20.99 0.00 0.00 0.00 0.04
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)...
0 200 64.23 1534.61 1172.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0 300 28.45 1640.76 921.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s
0 400 24.85 1598.82 907.55 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)...
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)...
1 500 216767.50 1776.50 72094.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)...
1 600 15.51 1534.36 851.89 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)...
1 700 15.90 1540.49 862.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1 800 13.65 1636.51 919.89 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s
2 900 10.47 1593.23 893.89 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)... Done. 1.0s
2 1000 9.92 1619.86 923.88 0.00 0.00 0.00 0.00 0.00 0.00 0.00
wandb: Adding directory to artifact (./training/spancat/engagement_spl/ALBERT_l_context_flz_span_finder/model-last)...
Here are my configs: I have RoBERTa-base (worked), RoBERTa-large, ALBERT-base, ALBERT-large, and ALBERT-xl.
Working:
Not working (scores return zero early in the training and do not come back):
Now that it runs without an error, I wonder this is related to learning rate
. Applying the same learning rate to large model is not a good idea?
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
source = "en_core_web_trf"
[vars]
spans_key = null
[system]
gpu_allocator = "pytorch"
seed = 0
[nlp]
lang = "en"
pipeline = ["transformer", "tagger", "parser", "ner", "trainable_transformer", "span_finder", "spancat"]
batch_size = 64
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.transformer]
source = "en_core_web_trf"
[components.tagger]
source = ${paths.source}
#upstream = "*"
# replace_listeners = ["model.transformer"]
[components.tagger.model]
@architectures = "spacy.Tagger.v2"
nO = null
normalize = false
[components.tagger.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}
[components.parser]
source = ${paths.source}
[components.parser.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "parser"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null
[components.parser.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}
[components.ner]
source = ${paths.source}
[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null
[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}
[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"
[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147
[components.trainable_transformer.model.tokenizer_config]
use_fast = true
[components.span_finder]
factory = "experimental_span_finder"
threshold = 0.2
predicted_key = "span_candidates"
training_key = ${vars.spans_key}
min_length = 0
max_length = 0
[components.span_finder.scorer]
@scorers = "spacy-experimental.span_finder_scorer.v1"
predicted_key = ${components.span_finder.predicted_key}
training_key = ${vars.spans_key}
[components.span_finder.model]
@architectures = "spacy-experimental.SpanFinder.v1"
[components.span_finder.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO=2
[components.span_finder.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"
[components.span_finder.model.tok2vec.pooling]
@layers = "reduce_mean.v1"
[components.spancat]
factory = "spancat"
max_positive = 2
spans_key = ${vars.spans_key}
threshold = 0.5
[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"
[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 256
[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null
[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"
[components.spancat.model.tok2vec.pooling]
@layers = "reduce_mean.v1"
[components.spancat.suggester]
@misc = "spacy-experimental.span_finder_suggester.v1"
candidates_key = ${components.span_finder.predicted_key}
[corpora]
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null
[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
# patience = 200
patience = 1500
max_epochs = 0
max_steps = 20000
# max_steps = 1000
eval_frequency = 100
frozen_components = ["transformer", "parser", "tagger", "ner"]
annotating_components = ["span_finder"]
before_to_disk = null
# [training.batcher]
# @batchers = "spacy.batch_by_sequence.v1"
# get_length = null
# [training.batcher.size]
# @schedules = "compounding.v1"
# start = 32
# stop = 256
# compound = 1.001
[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
[training.batcher.size]
@schedules = "compounding.v1"
start = 200
stop = 800
compound = 1.0005
# [training.logger]
# @loggers = "spacy.ConsoleLogger.v1"
# progress_bar = true
[training.logger]
@loggers = "spacy.WandbLogger.v3"
project_name = "spnacat_engagementv2"
remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
# log_dataset_dir = "./corpus"
model_log_interval = 100
entity="********"
run_name = "NoMONOGLOSS_S_20221030"
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005
[training.score_weights]
span_finder_span_candidates_f = 0.0
span_finder_span_candidates_p = 0.0
span_finder_span_candidates_r = 0.2
spans_sc_p = 0.1
spans_sc_r = 0.1
spans_sc_f = 0.7
dep_las_per_type = null
sents_p = null
sents_r = null
ents_per_type = null
tag_acc = null
dep_uas = null
dep_las = null
sents_f = null
ents_f = null
ents_p = null
ents_r = null
lemma_acc = null
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
source = "en_core_web_trf"
[vars]
spans_key = null
[system]
gpu_allocator = "pytorch"
seed = 0
[nlp]
lang = "en"
pipeline = ["transformer", "tagger", "parser", "ner", "trainable_transformer", "span_finder", "spancat"]
batch_size = 32
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
=== Omitted for brevity but basically the same as RoBERTa-base===
[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-large"
[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147
[components.trainable_transformer.model.tokenizer_config]
use_fast = true
[components.span_finder]
factory = "experimental_span_finder"
threshold = 0.2
predicted_key = "span_candidates"
training_key = ${vars.spans_key}
min_length = 0
max_length = 0
[components.span_finder.scorer]
@scorers = "spacy-experimental.span_finder_scorer.v1"
predicted_key = ${components.span_finder.predicted_key}
training_key = ${vars.spans_key}
[components.span_finder.model]
@architectures = "spacy-experimental.SpanFinder.v1"
[components.span_finder.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO=2
[components.span_finder.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"
[components.span_finder.model.tok2vec.pooling]
@layers = "reduce_mean.v1"
[components.spancat]
factory = "spancat"
max_positive = 2
spans_key = ${vars.spans_key}
threshold = 0.5
[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"
[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 256
[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null
[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "trainable_transformer"
[components.spancat.model.tok2vec.pooling]
@layers = "reduce_mean.v1"
[components.spancat.suggester]
@misc = "spacy-experimental.span_finder_suggester.v1"
candidates_key = ${components.span_finder.predicted_key}
[corpora]
[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null
[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1200
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = ["transformer", "parser", "tagger", "ner"]
annotating_components = ["span_finder"]
before_to_disk = null
[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 400
compound = 1.0002
[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = true
[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.0001
[training.score_weights]
span_finder_span_candidates_f = 0.0
span_finder_span_candidates_p = 0.0
span_finder_span_candidates_r = 0.2
spans_sc_p = 0.1
spans_sc_r = 0.1
spans_sc_f = 0.7
dep_las_per_type = null
sents_p = null
sents_r = null
ents_per_type = null
tag_acc = null
dep_uas = null
dep_las = null
sents_f = null
ents_f = null
ents_p = null
ents_r = null
lemma_acc = null
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "albert-base-v2"
[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147
[components.trainable_transformer.model.tokenizer_config]
use_fast = true
[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}
[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "albert-large-v2"
[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147
[components.trainable_transformer.model.tokenizer_config]
use_fast = true
[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
upstream = "transformer"
pooling = {"@layers":"reduce_mean.v1"}
[components.trainable_transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.trainable_transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "albert-xlarge-v2"
[components.trainable_transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
# window = 128
# stride = 96
window = 196
stride = 147
[components.trainable_transformer.model.tokenizer_config]
use_fast = true
For these models, the batch settings and Adam settings are constant.
Thank you so much for your help in advance.
Sorry for not being more clear, but as mentioned in this thread I think that while the code runs without error in 3.3 it may not be doing the right thing. For the models that "work", I would check that their outputs make sense.
Setting the right learning rate can be tricky, especially with large models, but if you get it wrong you usually won't have flat zeroes as a result.
This should be resolved by #11860 if you want to try building from master.
Thank you so much for keeping me in the loop! I will be testing later this week. Thank you so much.
This issue has been automatically closed because it was answered and there was no follow-up discussion.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hello,
I was not sure if the following is an error in the source code, but I get following error when I train spacy spancat model with ALBERT xlarge. (the weird thing is that it seems working with RoBERTa model without an issue, so I was not sure this is issue for thinc).
Here is my training environment:
I use miniforge3, on M1 max MBP.