flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

[Question]: How to use NER with Sentence classification? Multi-task learning #3224

Open kdk2612 opened 1 year ago

kdk2612 commented 1 year ago

Question

Hi, I am interested in doing Multi-task learning for NER and Sentence classification on a custom dataset. I was trying to find some tutorials or starting points but could not find any. Can you please provide some guidance on how to approach this?

helpmefindaname commented 1 year ago

Hi @kdk2612 You can do the following:

from flair.embeddings import TransformerEmbeddings
from flair.datasets import SENTEVAL_SST_GRANULAR, CONLL_03
from flair.models import TextClassifier, SequenceTagger
from flair.nn.multitask import make_multitask_model_and_corpus
from flair.trainers import ModelTrainer

# --- Embeddings that are shared by both models --- #
# use a transformer that can do both, sentence-embedding and word-embedding
shared_embedding = TransformerEmbeddings("distilbert-base-uncased", fine_tune=True, is_token_embedding=True, is_document_embedding=True)

# --- Task 1: Sentiment Analysis (5-class) --- #
corpus_1 = SENTEVAL_SST_GRANULAR()

model_1 = TextClassifier(shared_embedding,
                         label_dictionary=corpus_1.make_label_dictionary("class"),
                         label_type="class")

# -- Task 2: Named Entity Recognition on CONLL03 (english) -- #
corpus_2 = CONLL_03()

model_2 = SequenceTagger(shared_embedding,
                         label_dictionary=corpus_2.make_label_dictionary("ner"),
                         label_type="ner",
                         )

# -- Define mapping (which tagger should train on which model) -- #
multitask_model, multicorpus = make_multitask_model_and_corpus(
    [
        (model_1, corpus_1),
        (model_2, corpus_2),
    ]
)

# -- Create model trainer and train -- #
trainer = ModelTrainer(multitask_model, multicorpus)
trainer.fine_tune(f"resources/taggers/multitask_test")

If you have the annotations alligned, such that each sentence has Classification and NER labels, you can combine them:

def parse_annotations(my_data) -> Sentence:
    # create a sentence with labels, might look different depending on your code
    sentence = Sentence(my_data["text"])
    sentence.add_label("class", my_data["label"])
    for ner_label in my_data["ner"]:
        sentence[ner_label["start"]: ner_label["end"]].add_label("ner", ner_label["label"])

    # mark sentence for both tasks.
    sentence.add_label("multitask_id", "Task_0")
    sentence.add_label("multitask_id", "Task_1")

corpus = Corpus(train=[parse_annotations(annotation) for annotation in train_annotations], dev=...., test=...)
multitask_model = MultitaskModel([model_1, model_2], use_all_tasks=True) # `use_all_tasks` refers to training, ensuring that every sentence is trained on all tasks jointly. This is recommended if you use shared embeddings with alligned data, as it then will gather more training signal per batch (e.g. 4 sentences -> 8 annotations)
kdk2612 commented 1 year ago

Thanks. Follow-up question is, where to change the parameters for the learning? in the trainer object that is created using the ModelTrainer? Or the individual classifiers? Where can I find the documentation which explains how to train fine-tune/modify parameters for Multitask

kdk2612 commented 1 year ago

@helpmefindaname Also, when I tried to use stacked embeddings with the multitask model I got the following error

`

4. initialize embeddings

embedding_types: List[TokenEmbeddings] = [ FlairEmbeddings('/home/ec2-user/SageMaker/efs/karan/SearchCitation/models/embeddings/forward-best-lm.pt' ), FlairEmbeddings('/home/ec2-user/SageMaker/efs/karan/SearchCitation/models/embeddings/backward-best-lm.pt' ), ]

shared_embedding: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types) `

Error: `2023-05-02 20:48:11,386 ---------------------------------------------------------------------------------------------------- 2023-05-02 20:48:11,386 Model: "MultitaskModel( (Task_0): TextClassifier( (embeddings): StackedEmbeddings( (list_embedding_0): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) ) (decoder): Linear(in_features=2048, out_features=3, bias=True) (dropout): Dropout(p=0.0, inplace=False) (locked_dropout): LockedDropout(p=0.0) (word_dropout): WordDropout(p=0.0) (loss_function): CrossEntropyLoss() ) (Task_1): SequenceTagger( (embeddings): StackedEmbeddings( (list_embedding_0): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) ) (word_dropout): WordDropout(p=0.05) (locked_dropout): LockedDropout(p=0.5) (embedding2nn): Linear(in_features=2048, out_features=2048, bias=True) (rnn): LSTM(2048, 256, batch_first=True, bidirectional=True) (linear): Linear(in_features=512, out_features=3, bias=True) (loss_function): ViterbiLoss() (crf): CRF() ) )" 2023-05-02 20:48:11,387 ---------------------------------------------------------------------------------------------------- 2023-05-02 20:48:11,388 Corpus: "MultiCorpus: 67680 train + 14500 dev + 14518 test sentences

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/flair/trainers/trainer.py:549, in ModelTrainer.train(self, base_path, learning_rate, mini_batch_size, eval_batch_size, mini_batch_chunk_size, max_epochs, train_with_dev, train_with_test, monitor_train, monitor_test, main_evaluation_metric, scheduler, anneal_factor, patience, min_learning_rate, initial_extra_patience, optimizer, cycle_momentum, warmup_fraction, embeddings_storage_mode, checkpoint, save_final_model, anneal_with_restarts, anneal_with_prestarts, anneal_against_dev_loss, batch_growth_annealing, shuffle, param_selection_mode, write_weights, num_workers, sampler, use_amp, amp_opt_level, eval_on_train_fraction, eval_on_train_shuffle, save_model_each_k_epochs, tensorboard_comment, use_swa, use_final_model_for_eval, gold_label_dictionary_for_eval, exclude_labels, create_file_logs, create_loss_file, epoch, use_tensorboard, tensorboard_log_dir, metrics_for_tensorboard, optimizer_state_dict, scheduler_state_dict, save_optimizer_state, reduce_transformer_vocab, shuffle_first_epoch, **kwargs) 546 # forward and backward for batch 547 for batch_step in batch_steps: 548 # forward pass --> 549 loss, datapoint_count = self.model.forward_loss(batch_step) 550 average_over += datapoint_count 551 # Backward

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/flair/models/multitask_model.py:74, in MultitaskModel.forward_loss(self, sentences) 72 count = 0 73 for task_id, split in batch_split.items(): ---> 74 task_loss, task_count = self.tasks[task_id].forward_loss([sentences[i] for i in split]) 75 loss += self.loss_factors[task_id] * task_loss 76 count += task_count

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/flair/nn/model.py:744, in DefaultClassifier.forward_loss(self, sentences) 741 data_point_tensor = self._encode_data_points(sentences, data_points) 743 # decode --> 744 scores = self.decoder(data_point_tensor) 746 # an optional masking step (no masking in most cases) 747 scores = self._mask_scores(scores, data_points)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input) 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x0 and 2048x3) `

kdk2612 commented 1 year ago

Also, where can I find proper documentation? Can not find any definition of the Classes

kdk2612 commented 1 year ago

@alanakbik @helpmefindaname any thoughts on this??

helpmefindaname commented 1 year ago

Hi @kdk2612 I have given you an example with embeddings that can be both, document-embeddings and token-embeddings. You are now trying to use different embeddings that are just token-embeddings. So it won't work for the TextClassifier which expects document-embeddings.

You can always check the source code and see the exact definitions.

zrjohnnyl commented 5 months ago

What is the correct to train two tasks on a single corpus? I tried to train two models together using a single corpus, but got an error that said RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

multitask_dataset = CONLL_03_DUTCH()
tasks = ['ner', 'pos']
model_1 = initialize_tagger(multitask_dataset, shared_embedding, tasks[0])
model_2 = initialize_tagger(multitask_dataset, shared_embedding, tasks[1])
multitask_model = MultitaskModel([model_1, model_2], use_all_tasks=True, task_ids=tasks)
trainer = ModelTrainer(multitask_model, multitask_dataset)
trainer.fine_tune('resources/taggers/sota-ner-flert',
                  learning_rate=5.0e-6,
                  max_epochs=20)

╭─────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮ │ in :60 │ │ │ │ ❱ 60 trainer.fine_tune('resources/taggers/sota-ner-flert', │ │ │ │ /pyzr/active_venv/lib/python3.10/site-packages/flair/trainers/trainer.py:253 in fine_tune │ │ │ │ ❱ 253 │ │ return self.train_custom( │ │ │ │ /pyzr/active_venv/lib/python3.10/site-packages/flair/trainers/trainer.py:606 in train_custom │ │ │ │ ❱ 606 │ │ │ │ │ │ │ self._backward(scaler.scale(loss)) │ │ │ │ /pyzr/active_venv/lib/python3.10/site-packages/flair/trainers/trainer.py:124 in _backward │ │ │ │ ❱ 124 │ │ loss.backward() │ │ │ │ /pyzr/active_venv/lib/python3.10/site-packages/torch/_tensor.py:487 in backward │ │ │ │ ❱ 487 │ │ torch.autograd.backward( │ │ │ │ /pyzr/active_venv/lib/python3.10/site-packages/torch/autograd/init.py:200 in backward │ │ │ │ ❱ 200 │ Variable._execution_engine.run_backward( # Calls into the C++ engine to run the bac │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn