glample / tagger

Named Entity Recognition Tool
Apache License 2.0
1.16k stars 426 forks source link

EVALUATE #57

Closed binhna closed 6 years ago

binhna commented 7 years ago

I dont know how to evaluate my model (the folder has made after run train.py). Anybody please help me?

Rabia-Noureen commented 6 years ago

Hi @binhna I am new to python can you please help me out with training the model using GoogleNews word embeddings? I am trying to train using the script

python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --lr_method=adam --tag_scheme=iob --pre_emb=GoogleNews-vectors-negative300.bin --all_emb=300

I got this error: image

I am stuck with this issue for about 2 months and couldn't resolve it. Thanks in advance.

binhna commented 6 years ago

Can you show me the first 3 lines of the word embedding that you are using?

Rabia-Noureen commented 6 years ago

@binhna Thanks alot your your response i am using word2vec-GoogleNews-vectors as provided in the link below. Its a .bin.gz file https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit

Rabia-Noureen commented 6 years ago

@binhna sorry to disturb you again but I skipped the word2vec-GoogleNews-vectors file and other parameters and tried to train the model using the already provided dataset i got an other error. Am i doing some thing wrong while training?

(env_name27) C:\Users\Acer\tagger-master>python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --tag_scheme=iob WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL: https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available) Model location: ./models Found 23624 unique words (203621 in total) Found 84 unique characters Found 9 unique named entity tags 14041 / 3250 / 3453 sentences in train / dev / test. Saving the mappings to disk... Compiling... Starting epoch 0... 50, cost average: 14.516134 100, cost average: 8.294904 150, cost average: 14.409883 200, cost average: 11.035920 250, cost average: 14.829118 300, cost average: 8.705193 350, cost average: 10.033119 400, cost average: 10.041572 450, cost average: 11.864815 500, cost average: 10.191026 550, cost average: 11.418326 600, cost average: 10.012394 650, cost average: 10.535731 700, cost average: 12.022213 750, cost average: 10.865187 800, cost average: 10.012271 850, cost average: 10.825798 900, cost average: 12.069555 950, cost average: 11.846591 '.' is not recognized as an internal or external command, operable program or batch file. ID NE Total O B-LOC B-PER B-ORG I-PER I-ORG B-MISC I-LOC I-MISC Percent 0 O 42759 42759 0 0 0 0 0 0 0 0 100.000 1 B-LOC 1837 1837 0 0 0 0 0 0 0 0 0.000 2 B-PER 1842 1842 0 0 0 0 0 0 0 0 0.000 3 B-ORG 1341 1341 0 0 0 0 0 0 0 0 0.000 4 I-PER 1307 1307 0 0 0 0 0 0 0 0 0.000 5 I-ORG 751 751 0 0 0 0 0 0 0 0 0.000 6 B-MISC 922 922 0 0 0 0 0 0 0 0 0.000 7 I-LOC 257 257 0 0 0 0 0 0 0 0 0.000 8 I-MISC 346 346 0 0 0 0 0 0 0 0 0.000 42759/51362 (83.25026%) Traceback (most recent call last): File "train.py", line 220, in dev_data, id_to_tag, dico_tags) File "C:\Users\Acer\tagger-master\utils.py", line 282, in evaluate return float(eval_lines[1].strip().split()[-1]) IndexError: list index out of range

glample commented 6 years ago

Hey, sorry for the delay. You can do this by running the train.py script again, using the reload parameter. You will have to edit the code of train.py a bit to skip training and directly go to the evaluation part.