flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

Runtime Error using TransformerWordEmbeddings #1600

Closed SinghTejwinder closed 4 years ago

SinghTejwinder commented 4 years ago

A clear and concise description of what you want to know. Hi,

I am trying to use TransformerWordEmbeddings for ner tagging task but I am getting this runtime error. Error

I have written this code

code

alanakbik commented 4 years ago

Hello @SinghTejwinder we've just merged a PR that should fix these issues. Could you update your local version and try again?

sumyatthitsarr commented 4 years ago

RuntimeError: shape '[2, 13, 1536]' is invalid for input of size 30720 I still got this error even after cloning master branch.

alanakbik commented 4 years ago

Can you post the script to reproduce?

sumyatthitsarr commented 4 years ago

@alanakbik sure!

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import TransformerWordEmbeddings, TokenEmbeddings, StackedEmbeddings, BertEmbeddings, CharacterEmbeddings, BytePairEmbeddings, WordEmbeddings
from flair.visual.training_curves import Plotter
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
from typing import List
from torch.optim.adam import Adam
import sys
import flair, torch

flair.device = torch.device('cuda:0')
columns = {0: 'text', 1: 'ner'}
data_folder =  './data/Model_Trainer_Data'
corpus: Corpus = ColumnCorpus(data_folder, columns,train_file= sys.argv[1] , dev_file=sys.argv[3], test_file= sys.argv[4], in_memory=False)

tag_type = 'pos'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
# bert_embedding = BertEmbeddings('bert-base-multilingual-cased')

embeddings = TransformerWordEmbeddings(
        'bert-base-multilingual-cased', # which transformer model
        layers="-1", # which layers (here: only last layer when fine-tuning)
        pooling_operation='first_last', # how to pool over split tokens
        fine_tune=True, # whether or not to fine-tune
    )

tagger: SequenceTagger = SequenceTagger(
        hidden_size=256,
        embeddings=embeddings,
        tag_dictionary=tag_dictionary,
        tag_type=tag_type,
        use_crf=False,
        use_rnn=False,
    )

# use Adam optimizer when fine-tuning
trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.Adam)

trainer.train('./results/'+ sys.argv[2],
              learning_rate=3e-5,
              embeddings_storage_mode = 'gpu',
              mini_batch_chunk_size=2, # set this if you get OOM errors
              max_epochs=10, # very few epochs of fine-tuning
              train_with_dev=True,
              checkpoint=True)

plotter = Plotter()
# plotter.plot_training_curves('./results/model2/loss.tsv')
plotter.plot_weights('./results/'+ sys.argv[2]+'/weights.txt')
alanakbik commented 4 years ago

Can you also share the data or use a public dataset so I can reproduce?

SinghTejwinder commented 4 years ago

@alanakbik I updated my local version and tried again. But this time I got this

2020-05-12 15:28:09,742 epoch 1 - iter 8/88 - loss 361.28995132 - samples/sec: 7.52 2020-05-12 15:28:09,946 Tokenization MISMATCH in sentence 'Note that the presented architecture works at the frame level , meaning that each single frame ( plus its corresponding context ) is fed-forward through the network , obtaining a class posterior probability for all of the target languages . This fact makes the DNNs particularly suitable for real-time applications because , unlike other approaches ( i.e. i-vectors ) , we can potentially make a decision about the language at each new frame . Indeed , at each frame , we can combine the evidence from past frames to get a single similarity score between the test utterance and the targetlanguages . A simple way of doing this combination is to assume that frames are independent and multiply the posterior estimates of the last layer . The score sl for language l of a given test utterance is computed by multiplying the output probabilities pl obtained for all of its frames ; or equivalently , accumulating the logs as :( 6 ) sl = 1N ∑ t = 1Nlogp ( Ll | xt ​ , θ ) where p ( Ll | xt ​ , θ ) represents the class probability output for the language l corresponding to the input example at time t , xt by using the DNN defined by parameters θ .'
2020-05-12 15:28:09,946 Last matched: 'Token: 174 ​' 2020-05-12 15:28:09,946 Last sentence: 'Token: 215 .' 2020-05-12 15:28:09,946 subtokenized: '['[CLS]', 'Note', 'that', 'the', 'presented', 'architecture', 'works', 'at', 'the', 'frame', 'level', ',', 'meaning', 'that', 'each', 'single', 'frame', '(', 'plus', 'its', 'corresponding', 'context', ')', 'is', 'fed', '-', 'forward', 'through', 'the', 'network', ',', 'obtaining', 'a', 'class', 'posterior', 'probability', 'for', 'all', 'of', 'the', 'target', 'languages', '.', 'This', 'fact', 'makes', 'the', 'D', '##N', '##N', '##s', 'particularly', 'suitable', 'for', 'real', '-', 'time', 'applications', 'because', ',', 'unlike', 'other', 'approaches', '(', 'i', '.', 'e', '.', 'i', '-', 'vectors', ')', ',', 'we', 'can', 'potentially', 'make', 'a', 'decision', 'about', 'the', 'language', 'at', 'each', 'new', 'frame', '.', 'Indeed', ',', 'at', 'each', 'frame', ',', 'we', 'can', 'combine', 'the', 'evidence', 'from', 'past', 'frames', 'to', 'get', 'a', 'single', 'similarity', 'score', 'between', 'the', 'test', 'utter', '##ance', 'and', 'the', 'target', '##lang', '##ua', '##ges', '.', 'A', 'simple', 'way', 'of', 'doing', 'this', 'combination', 'is', 'to', 'assume', 'that', 'frames', 'are', 'independent', 'and', 'multi', '##p', '##ly', 'the', 'posterior', 'estimates', 'of', 'the', 'last', 'layer', '.', 'The', 'score', 's', '##l', 'for', 'language', 'l', 'of', 'a', 'given', 'test', 'utter', '##ance', 'is', 'com', '##puted', 'by', 'multi', '##p', '##lying', 'the', 'output', 'pro', '##ba', '##bilities', 'p', '##l', 'obtained', 'for', 'all', 'of', 'its', 'frames', ';', 'or', 'equivalent', '##ly', ',', 'a', '##cc', '##um', '##ulating', 'the', 'logs', 'as', ':', '(', '6', ')', 's', '##l', '=', '1', '##N', '[UNK]', 't', '=', '1', '##N', '##log', '##p', '(', 'L', '##l', '|', 'x', '##t', ',', 'θ', ')', 'where', 'p', '(', 'L', '##l', '|', 'x', '##t', ',', 'θ', ')', 'represents', 'the', 'class', 'probability', 'output', 'for', 'the', 'language', 'l', 'corresponding', 'to', 'the', 'input', 'example', 'at', 'time', 't', ',', 'x', '##t', 'by', 'using', 'the', 'D', '##N', '##N', 'defined', 'by', 'parameters', 'θ', '.', '[SEP]']'
Traceback (most recent call last): File "test_span.py", line 48, in patience=4) File "/media/data_dump_1/Tejwinder/keyphrase-extraction/flair/flair/trainers/trainer.py", line 345, in train
loss = self.model.forward_loss(batch_step) File "/media/data_dump_1/Tejwinder/keyphrase-extraction/flair/flair/models/sequence_tagger_model.py", line 499, in forward_loss features = self.forward(data_points) File "/media/data_dump_1/Tejwinder/keyphrase-extraction/flair/flair/models/sequence_tagger_model.py", line 532, in forward self.embeddings.embedding_length, RuntimeError: shape '[4, 215, 9984]' is invalid for input of size 8166912

I am not able to figure out the problem

alanakbik commented 4 years ago

@SinghTejwinder thanks for pasting the error message. It looks like your string contains a zero-width space which caused an error in token matching. I'll push a fix in a few minutes.

alanakbik commented 4 years ago

Merged the PR. Could you update and try again?

SinghTejwinder commented 4 years ago

Still getting error

alanakbik commented 4 years ago

Same error message?

SinghTejwinder commented 4 years ago

Tokenization MISMATCH in sentence 'Absorption events through the charged current reactions ( 2 ) e + 40Ar e + 40K and_ e + 40Ar e ++ 40Cl . There is some uncertainty in predicting e ( e +) event rates for these processes which arise due to the nuclear model dependencies of the absorption cross section and the treatment of the Coulomb distortion of electron ( positron ) in the field of the residual nucleus . The nuclear absorption cross section for the charged current neutrino reactions in 40Ar relevant to supernova neutrino energies was first calculated by Raghavan [ 10 ] and Bahcall et al . [ 11 ] for Fermi transitions leading to isobaric analogue state ( IAS ) at 4.38 MeV in 40K . Later Ormand et al . [ 12 ] used a shell model to calculate the Fermi and Gamow Teller transitions . In these calculations Fermi function F ( Z,Ee ) was used to take into account the Coulomb effects . In a recent paper Bueno et al . [ 13 ] make use of a calculation by Martinez-Pinedo et al . [ 14 ] who use a shell model for Fermi and Gamow Teller transitions and a continuum random phase approximation ( CRPA ) for forbidden transitions to calculate the absorption cross sections . In this calculation the Coulomb distortion of the produced electron is treated with a hybrid model where a Fermi function is used for lower electron energies and modified effective momentum approximation ( MEMA ) for higher electron energies [ 14 17 ] . In a recent work Bhattacharya et al . [ 18 ] have measured the Fermi and Gamow Teller transition strengths leading to excited states up to 6 MeV in 40K and obtained the neutrino absorption cross section for supernova neutrinos in 40Ar .'
2020-05-16 12:14:06,437 Last matched: 'Token: 20' 2020-05-16 12:14:06,437 Last sentence: 'Token: 314 .'
2020-05-16 12:14:06,437 subtokenized: '['[CLS]', 'absorption', 'events', 'through', 'the', 'charged', 'current', 'reactions', '(', '2', ')', '
', '##e', '+', '40', '##ar', '', 'e', '', '+', '40', '##k', '', 'and', '##', 'e', '+', '40', '##ar', '', 'e', '+', '+', '40', '##c', '##l', '', '.', 'there', 'is', 'some', 'uncertainty', 'in', 'predict', '##ing', 'e', '', '(', 'e', '+', ')', 'event', 'rates', 'for', 'these', 'processes', 'which', 'arise', 'due', 'to', 'the', 'nuclear', 'model', 'depend', '##encies', 'of', 'the', 'absorption', 'cross', 'section', 'and', 'the', 'treatment', 'of', 'the', 'co', '##ulo', '##mb', 'distortion', 'of', 'electron', '(', 'p', '##os', '##it', '##ron', ')', 'in', 'the', 'field', 'of', 'the', 'residual', 'nucleus', '.', 'the', 'nuclear', 'absorption', 'cross', 'section', 'for', 'the', 'charged', 'current', 'ne', '##ut', '##rino', 'reactions', 'in', '40', '##ar', 'relevant', 'to', 'super', '##nova', 'ne', '##ut', '##rino', 'energies', 'was', 'first', 'calculated', 'by', 'rag', '##ha', '##van', '[', '10', ']', 'and', 'b', '##ah', '$#cal', '##l', 'et', 'al', '.', '[', '11', ']', 'for', 'f', '##er', '##mi', 'transitions', 'leading', 'to', 'is',[4/1911] '##ari', '##c', 'analogue', 'state', '(', 'i', '##as', ')', 'at', '4', '.', '38', 'me', '##v', 'in', '40', '##k', '', '.', 'later', 'or', '##mand', 'et', 'al', '.', '[', '12', ']', 'used', 'a', 'shell', 'model', 'to', 'calculate', 'the', 'f', '##er', '##mi', 'and', 'g', '##amo', '##w', '', 'tell', '##er', 'transitions', '.', 'in', 'these', 'calculations', 'f', '##er', '##mi', 'function', 'f', '(', 'z', ',', 'e', '##e', ')', 'was', 'used', 'to', 'take', 'into', 'account', 'the', 'co', '##ulo', '##mb', 'effects', '.', 'in', 'a', 'recent', 'paper', 'b', '##uen', '##o', 'et', 'al', '.', '[', '13', ']', 'make', 'use', 'of', 'a', 'calculation', 'by', 'ma', '##rt', '##ine', '##z', '-', 'pine', '##do', 'et', 'al', '.', '[', '14', ']', 'who', 'use', 'a', 'shell', 'model', 'for', 'f', '##er', '##mi', 'and', 'g', '##amo', '##w', '', 'tell', '##er', 'transitions', 'and', 'a', 'con', '##tinuum', 'random', 'phase', 'approximation', '(', 'c', '##rp', '##a', ')', 'for', 'forbidden', 'transitions', 'to', 'calculate', 'the', 'absorption', 'cross', 'sections', '.', 'in', 'this', 'calculation', 'the', 'co', '##ulo', '##mb', 'distortion', 'of', 'the', 'produced', 'electron', 'is', 'treated', 'with', 'a', 'hybrid', 'model', 'where', 'a', 'f', '##er', '##mi', 'function', 'is', 'used', 'for', 'lower', 'electron', 'energies', 'and', 'modified', 'effective', 'momentum', 'approximation', '(', 'me', '##ma', ')', 'for', 'higher', 'electron', 'energies', '[', '14', '', '17', ']', '.', 'in', 'a', 'recent', 'work', 'b', '##hat', '##ta', '##charya', 'et', 'al', '.', '[', '18', ']', 'have', 'measured', 'the', 'f', '##er', '##mi', 'and', 'g', '##amo', '##w', '', 'tell', '##er', 'transition', 'strengths', 'leading', 'to', 'excited', 'states', 'up', 'to', '6', 'me', '##v', 'in', '40', '##k', '_', 'and', 'obtained', 'the', 'ne', '##ut', '##rino', 'absorption', 'cross', 'section', 'for', 'super', '##nova', 'ne', '##ut', '##rino', '##s', 'in', '40', '##ar', '.', '[SEP]']'
Traceback (most recent call last):
File "train_new.py", line 815, in trainer=train(dataset_path,[embedding],output,args)
File "train_new.py", line 706, in train num_workers=args.threads,
File "/media/data_dump_1/Tejwinder/keyphrase-extraction/flair/flair/trainers/trainer.py", line 345, in train loss = self.model.forward_loss(batch_step) File "/media/data_dump_1/Tejwinder/keyphrase-extraction/flair/flair/models/sequence_tagger_model.py", line 499, in forward_loss features = self.forward(data_points)
File "/media/data_dump_1/Tejwinder/keyphrase-extraction/flair/flair/models/sequence_tagger_model.py", line 532, in forward self.embeddings.embedding_length, RuntimeError: shape '[4, 314, 9984]' is invalid for input of size 9594624

alanakbik commented 4 years ago

Hello @SinghTejwinder I cannot reproduce this error on current master. The above sentence works for me. Are you sure you're on current master? Or are there special symbols in the sentence that are not displayed online?

SinghTejwinder commented 4 years ago

I think special symbols are not being displayed here. Can you please once check with this dataset https://github.com/midas-research/keyphrase-extraction-as-sequence-labeling-data/tree/master/SemEval-2017

Also this problem was not there when I was using earlier version of flair(<0.4.5)

alanakbik commented 4 years ago

The dataset works for me, I can train models with various transformer embeddings and no errors..

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wahid18benz commented 3 years ago

@Wickky can you please tell me how you have choosed TransformerWordEmbeddings parameters and learning_rate ? thank you