Zero (0) F-1 Score for Sequene Tagger with transformer embedding with BIO tags

iambankaratharva commented 1 year ago

Question

Hi, I have data in BIO format (not BIOES). I am training a sequence tagger model with transformer embedding but consistently get 0 f1-score for every epoch for XLM-ROBERTA-LARGE, but for other models (BERT-BASE-UNCASED) I'm getting a non-zero F-1 score. Could you please help me understand the reason? I can confirm that the loss was decreasing consistently. Code for XLM-ROBERTA-LARGE below:

# tag to predict
tag_type = 'ner'
# make tag dictionary from the corpus
label_dict = corpus.make_label_dictionary('ner', add_unk=False)
print(label_dict.get_items())

from flair.embeddings import TransformerWordEmbeddings

embeddings = TransformerWordEmbeddings(model='xlm-roberta-large',
                                       layers="-1",
                                       subtoken_pooling="first",
                                       fine_tune=True,
                                       use_context=False,
                                       )

from flair.models import SequenceTagger

tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=label_dict,
                        tag_type='ner',
                        tag_format='BIO',
                        use_crf=True,
                        use_rnn=False,
                        reproject_embeddings=False,
                        )
print(tagger)

from flair.trainers import ModelTrainer
trainer = ModelTrainer(tagger, corpus)
print(trainer)

trainer.train('resources/taggers/xlm-roberta-large',
                  learning_rate=0.005,
                  max_epochs=10,
                  mini_batch_size=16,
                  patience=2,
              mini_batch_chunk_size=1,  # remove this parameter to speed up computation if you have a big GPU,
                  embeddings_storage_mode='none',
                  checkpoint=True,
                  write_weights=True,
                  )

Training data snapshot:

alanakbik commented 1 year ago

Hello @iambankaratharva it could be that your learning rate is too high for XLM-RoBERTa-Large. This model is really large, so we typically use a much smaller learning rate around 5e-6.

Also, we recommend to use the fine_tune method as illustrated in the script here.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

flairNLP / flair

Zero (0) F-1 Score for Sequene Tagger with transformer embedding with BIO tags #3192

Question