Open kdk2612 opened 1 year ago
Hi @kdk2612 You can do the following:
from flair.embeddings import TransformerEmbeddings
from flair.datasets import SENTEVAL_SST_GRANULAR, CONLL_03
from flair.models import TextClassifier, SequenceTagger
from flair.nn.multitask import make_multitask_model_and_corpus
from flair.trainers import ModelTrainer
# --- Embeddings that are shared by both models --- #
# use a transformer that can do both, sentence-embedding and word-embedding
shared_embedding = TransformerEmbeddings("distilbert-base-uncased", fine_tune=True, is_token_embedding=True, is_document_embedding=True)
# --- Task 1: Sentiment Analysis (5-class) --- #
corpus_1 = SENTEVAL_SST_GRANULAR()
model_1 = TextClassifier(shared_embedding,
label_dictionary=corpus_1.make_label_dictionary("class"),
label_type="class")
# -- Task 2: Named Entity Recognition on CONLL03 (english) -- #
corpus_2 = CONLL_03()
model_2 = SequenceTagger(shared_embedding,
label_dictionary=corpus_2.make_label_dictionary("ner"),
label_type="ner",
)
# -- Define mapping (which tagger should train on which model) -- #
multitask_model, multicorpus = make_multitask_model_and_corpus(
[
(model_1, corpus_1),
(model_2, corpus_2),
]
)
# -- Create model trainer and train -- #
trainer = ModelTrainer(multitask_model, multicorpus)
trainer.fine_tune(f"resources/taggers/multitask_test")
If you have the annotations alligned, such that each sentence has Classification and NER labels, you can combine them:
def parse_annotations(my_data) -> Sentence:
# create a sentence with labels, might look different depending on your code
sentence = Sentence(my_data["text"])
sentence.add_label("class", my_data["label"])
for ner_label in my_data["ner"]:
sentence[ner_label["start"]: ner_label["end"]].add_label("ner", ner_label["label"])
# mark sentence for both tasks.
sentence.add_label("multitask_id", "Task_0")
sentence.add_label("multitask_id", "Task_1")
corpus = Corpus(train=[parse_annotations(annotation) for annotation in train_annotations], dev=...., test=...)
multitask_model = MultitaskModel([model_1, model_2], use_all_tasks=True) # `use_all_tasks` refers to training, ensuring that every sentence is trained on all tasks jointly. This is recommended if you use shared embeddings with alligned data, as it then will gather more training signal per batch (e.g. 4 sentences -> 8 annotations)
Thanks. Follow-up question is, where to change the parameters for the learning? in the trainer object that is created using the ModelTrainer? Or the individual classifiers? Where can I find the documentation which explains how to train fine-tune/modify parameters for Multitask
@helpmefindaname Also, when I tried to use stacked embeddings with the multitask model I got the following error
`
embedding_types: List[TokenEmbeddings] = [ FlairEmbeddings('/home/ec2-user/SageMaker/efs/karan/SearchCitation/models/embeddings/forward-best-lm.pt' ), FlairEmbeddings('/home/ec2-user/SageMaker/efs/karan/SearchCitation/models/embeddings/backward-best-lm.pt' ), ]
shared_embedding: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types) `
Error: `2023-05-02 20:48:11,386 ---------------------------------------------------------------------------------------------------- 2023-05-02 20:48:11,386 Model: "MultitaskModel( (Task_0): TextClassifier( (embeddings): StackedEmbeddings( (list_embedding_0): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) ) (decoder): Linear(in_features=2048, out_features=3, bias=True) (dropout): Dropout(p=0.0, inplace=False) (locked_dropout): LockedDropout(p=0.0) (word_dropout): WordDropout(p=0.0) (loss_function): CrossEntropyLoss() ) (Task_1): SequenceTagger( (embeddings): StackedEmbeddings( (list_embedding_0): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(275, 100) (rnn): LSTM(100, 1024) ) ) ) (word_dropout): WordDropout(p=0.05) (locked_dropout): LockedDropout(p=0.5) (embedding2nn): Linear(in_features=2048, out_features=2048, bias=True) (rnn): LSTM(2048, 256, batch_first=True, bidirectional=True) (linear): Linear(in_features=512, out_features=3, bias=True) (loss_function): ViterbiLoss() (crf): CRF() ) )" 2023-05-02 20:48:11,387 ---------------------------------------------------------------------------------------------------- 2023-05-02 20:48:11,388 Corpus: "MultiCorpus: 67680 train + 14500 dev + 14518 test sentences
RuntimeError Traceback (most recent call last) Cell In[9], line 15 10 trainer = ModelTrainer(multitask_model, 11 multicorpus, 12 ) 14 # trainer.fine_tune(f"/home/ec2-user/SageMaker/efs/karan/SearchCitation/models/Multitask/MultiCorp") ---> 15 trainer.train(f"/home/ec2-user/SageMaker/efs/karan/SearchCitation/models/Multitask/MultiCorp", 16 learning_rate = 0.001)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/flair/trainers/trainer.py:549, in ModelTrainer.train(self, base_path, learning_rate, mini_batch_size, eval_batch_size, mini_batch_chunk_size, max_epochs, train_with_dev, train_with_test, monitor_train, monitor_test, main_evaluation_metric, scheduler, anneal_factor, patience, min_learning_rate, initial_extra_patience, optimizer, cycle_momentum, warmup_fraction, embeddings_storage_mode, checkpoint, save_final_model, anneal_with_restarts, anneal_with_prestarts, anneal_against_dev_loss, batch_growth_annealing, shuffle, param_selection_mode, write_weights, num_workers, sampler, use_amp, amp_opt_level, eval_on_train_fraction, eval_on_train_shuffle, save_model_each_k_epochs, tensorboard_comment, use_swa, use_final_model_for_eval, gold_label_dictionary_for_eval, exclude_labels, create_file_logs, create_loss_file, epoch, use_tensorboard, tensorboard_log_dir, metrics_for_tensorboard, optimizer_state_dict, scheduler_state_dict, save_optimizer_state, reduce_transformer_vocab, shuffle_first_epoch, **kwargs) 546 # forward and backward for batch 547 for batch_step in batch_steps: 548 # forward pass --> 549 loss, datapoint_count = self.model.forward_loss(batch_step) 550 average_over += datapoint_count 551 # Backward
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/flair/models/multitask_model.py:74, in MultitaskModel.forward_loss(self, sentences) 72 count = 0 73 for task_id, split in batch_split.items(): ---> 74 task_loss, task_count = self.tasks[task_id].forward_loss([sentences[i] for i in split]) 75 loss += self.loss_factors[task_id] * task_loss 76 count += task_count
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/flair/nn/model.py:744, in DefaultClassifier.forward_loss(self, sentences) 741 data_point_tensor = self._encode_data_points(sentences, data_points) 743 # decode --> 744 scores = self.decoder(data_point_tensor) 746 # an optional masking step (no masking in most cases) 747 scores = self._mask_scores(scores, data_points)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input) 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x0 and 2048x3) `
Also, where can I find proper documentation? Can not find any definition of the Classes
@alanakbik @helpmefindaname any thoughts on this??
Hi @kdk2612 I have given you an example with embeddings that can be both, document-embeddings and token-embeddings. You are now trying to use different embeddings that are just token-embeddings. So it won't work for the TextClassifier which expects document-embeddings.
You can always check the source code and see the exact definitions.
What is the correct to train two tasks on a single corpus? I tried to train two models together using a single corpus, but got an error that said RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
multitask_dataset = CONLL_03_DUTCH()
tasks = ['ner', 'pos']
model_1 = initialize_tagger(multitask_dataset, shared_embedding, tasks[0])
model_2 = initialize_tagger(multitask_dataset, shared_embedding, tasks[1])
multitask_model = MultitaskModel([model_1, model_2], use_all_tasks=True, task_ids=tasks)
trainer = ModelTrainer(multitask_model, multitask_dataset)
trainer.fine_tune('resources/taggers/sota-ner-flert',
learning_rate=5.0e-6,
max_epochs=20)
╭─────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
│ in
Question
Hi, I am interested in doing Multi-task learning for NER and Sentence classification on a custom dataset. I was trying to find some tutorials or starting points but could not find any. Can you please provide some guidance on how to approach this?