cdli-gh / Sumerian-Translation-Pipeline

UrIII Period (Sumerian Language) Information Extraction pipeline including, Named Entity Recognition, Part Of Speech Tagging and Machine Translation
MIT License
26 stars 8 forks source link

Connected various POS and NER models to classes to check_branch #4

Open deepPublicGit opened 3 years ago

deepPublicGit commented 3 years ago

https://github.com/cdli-gh/Sumerian-Translation-Pipeline/issues/3

Converted the POS and NER models to classes. While testing I encountered issues with the Word_Emeddings files, glove50.txt and sumerian_word2vec_50.txt both contained a line that was unreadable by the model and hence removed it.

The following is a short file that describes the issues I faced while trying to execute the files (the issue occurs even with the original CLI (prediction.py) files).: image

POS_CRF, NER_CRF = prediction via pipeline working FINE, POS_NER to conll working FINE POS_HMM = prediction via pipeline working FINE, POS_NER to conll list index out of range POS_Bi_LSTM, NER_Bi_LSTM = prediction via pipeline working FINE, POS_NER to conll working FINE POS_Bi_LSTM_CRF = prediction via pipeline working FINE, POS_NER to conll working FINE NER_Bi_LSTM_CRF = Key Error, (possible issue with sumerian_vocab.pkl file created while training)

Note: POS_Bi_LSTM_CRF this works when MAX is set to 19.