glample / tagger

Named Entity Recognition Tool
Apache License 2.0
1.16k stars 426 forks source link

Running time with GPU #53

Closed HaniehP closed 7 years ago

HaniehP commented 7 years ago

Hi, how much time will be saved by running this program on GPU rather than CPU?

glample commented 7 years ago

Hi,

This will be slower on GPU than on CPU. Mostly because of the operations on the CRF layer I guess, and also because the implementation does not support mini-batch.

Rabia-Noureen commented 7 years ago

Hi @glample @HaniehP I am new to python can you please help me out with training the model using GoogleNews word embeddings? I am trying to train using the script

python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb --lr_method=adam --tag_scheme=iob --pre_emb=GoogleNews-vectors-negative300.bin --all_emb=300

I got this error: image

I am stuck with this issue for about 2 months and couldn't resolve it. Thanks in advance.

HaniehP commented 7 years ago

Try 'ISO-8859-1' instead of 'UTF-8'. That helped me in another project.

Rabia-Noureen commented 7 years ago

@HaniehP Thank you so much for your response, for the time being when i tried to train the model with out the word embedding i got an other error, there is something wrong it seems: (env_name27) C:\Users\Acer\tagger-master>python train.py --train dataset/eng.train --dev dataset/eng.testa --test dataset/eng.testb WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL: https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 620M (CNMeM is enabled with initial size: 85.0% of memory, cuDNN not available) Model location: ./models Found 23624 unique words (203621 in total) Found 84 unique characters Found 17 unique named entity tags 14041 / 3250 / 3453 sentences in train / dev / test. Saving the mappings to disk... Compiling... Starting epoch 0... 50, cost average: 15.406189 100, cost average: 11.704297 150, cost average: 10.767459 200, cost average: 13.812738 250, cost average: 11.460194 300, cost average: 13.207466 350, cost average: 12.146099 400, cost average: 12.428576 450, cost average: 10.977689 500, cost average: 12.830771 550, cost average: 10.062991 600, cost average: 9.834551 650, cost average: 11.481623 700, cost average: 9.460655 750, cost average: 9.907359 800, cost average: 10.251657 850, cost average: 10.405848 900, cost average: 14.113665 950, cost average: 10.436158 '.' is not recognized as an internal or external command, operable program or batch file. ID NE Total O S-LOC B-PER E-PER S-ORG S-MISC B-ORG E-ORG S-PER I-ORG B-LOC E-LOC B-MISC E-MISC I-MISC I-PER I-LOC Percent 0 O 42759 42759 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100.000 1 S-LOC 1603 1603 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 2 B-PER 1234 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 3 E-PER 1234 1234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 4 S-ORG 891 891 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 5 S-MISC 665 665 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 6 B-ORG 450 450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 7 E-ORG 450 450 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 8 S-PER 608 608 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 9 I-ORG 301 301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 10 B-LOC 234 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 11 E-LOC 234 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 12 B-MISC 257 257 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 13 E-MISC 257 257 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 14 I-MISC 89 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 15 I-PER 73 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 16 I-LOC 23 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.000 42759/51362 (83.25026%) Traceback (most recent call last): File "train.py", line 220, in dev_data, id_to_tag, dico_tags) File "C:\Users\Acer\tagger-master\utils.py", line 282, in evaluate return float(eval_lines[1].strip().split()[-1]) IndexError: list index out of range

Rabia-Noureen commented 7 years ago

@HaniehP please guide me how can i convert my word embedding in .txt file to 'ISO-8859-1'?

HaniehP commented 7 years ago

In your python code, replace "codecs.open(path, 'r', 'utf8')" with "codecs.open(path, 'r', 'ISO-8859-1')"