Hironsan / anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
https://anago.herokuapp.com/
MIT License
1.48k stars 368 forks source link

KeyError: 'O"' #52

Closed OsamaBSalameh closed 6 years ago

OsamaBSalameh commented 6 years ago

Hi @Hironsan

thanks for your effort in building such a fine tool.

I'm using anaGo for my school Project, for the Arabic language, but it seems there is a problem.

and this is the dataset I used :

1) WikiFANE_Gold: Gold standard Wikipedia-based Fine-grained Arabic Named Entity Corpus, ~500K tokens 2) NewsFANE_Gold: Gold standard Newswire-based Fine-grained Arabic Named Entity Corpus, ~170K tokens.

this is the problem :

Epoch 1/15 789/789 [==============================] - 5196s 7s/step - loss: 337.9646 Traceback (most recent call last): File "C:/Users/Osama Bani Salameh/PycharmProjects/anaGoTest/anaGoTestMainFile.py", line 11, in model.train(x_train, y_train, x_valid, y_valid) File "C:\Program Files\Python36\lib\site-packages\anago\wrapper.py", line 50, in train trainer.train(x_train, y_train, x_valid, y_valid) File "C:\Program Files\Python36\lib\site-packages\anago\trainer.py", line 51, in train callbacks=callbacks) File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 2262, in fit_generator callbacks.on_epoch_end(epoch, epoch_logs) File "C:\Program Files\Python36\lib\site-packages\keras\callbacks.py", line 77, in on_epoch_end callback.on_epoch_end(epoch, logs) File "C:\Program Files\Python36\lib\site-packages\anago\metrics.py", line 124, in on_epoch_end for i, (data, label) in enumerate(self.valid_batches): File "C:\Program Files\Python36\lib\site-packages\anago\reader.py", line 137, in data_generator yield preprocessor.transform(X, y) File "C:\Program Files\Python36\lib\site-packages\anago\preprocess.py", line 112, in transform y = [[self.vocab_tag[t] for t in sent] for sent in y] File "C:\Program Files\Python36\lib\site-packages\anago\preprocess.py", line 112, in y = [[self.vocab_tag[t] for t in sent] for sent in y] File "C:\Program Files\Python36\lib\site-packages\anago\preprocess.py", line 112, in y = [[self.vocab_tag[t] for t in sent] for sent in y] KeyError: 'O"'

by the way, the dataset has O label within it, so do you know what's going on?

regards, Osama Bani Salameh

Hironsan commented 6 years ago

Did you use Fine-grained Arabic Named Entity Corpora for training?

I trained the model with the dataset and I could do it:

venv ❯ python train_example.py --data_path=WIKIFANE_selective.txt
Loading datasets...
Transforming datasets...
149369
Building a model...
Training the model...

Epoch 1/1
1607/1607 [==============================] - 13867s 9s/step - loss: 492.3222
 - f1: 72.06
Saving the model...

Try to use code in ver1.0.0 branch.