datquocnguyen / RDRPOSTagger

A fast and accurate POS and morphological tagging toolkit (EACL 2014)
http://rdrpostagger.sourceforge.net
Other
140 stars 48 forks source link

RDRPOSTagger.py returns blank error #13

Closed matgrioni closed 7 years ago

matgrioni commented 7 years ago

I using the following command within RDRPOSTagger/pSCRDRtagger

python RDRPOSTagger.py ../Models/UniPOS/UD_Latin/la-upos.RDR ../Models/UniPOS/UD_Latin/la-upos.DICT rawDataPath

For some of the files I run it on it works as expected. For others, such as the one attached there is an error output as follows:

=> Read a POS tagging model from ../Models/UniPOS/UD_Latin/la-upos.RDR

=> Read a lexicon from ../Models/UniPOS/UD_Latin/la-upos.DICT

=> Perform POS tagging on /home/grioni.2/NER/Preprocessing/Preprocessed/UNKNOWN/Tacitus.txt

ERROR ==>  "''"

===== Usage =====

#1: To train RDRPOSTagger on a gold standard training corpus:

python RDRPOSTagger.py train PATH-TO-GOLD-STANDARD-TRAINING-CORPUS

Example: python RDRPOSTagger.py train ../data/goldTrain

#2: To use the trained model for POS tagging on a raw text corpus:

python RDRPOSTagger.py tag PATH-TO-TRAINED-MODEL PATH-TO-LEXICON PATH-TO-RAW-TEXT-CORPUS

Example: python RDRPOSTagger.py tag ../data/goldTrain.RDR ../data/goldTrain.DICT ../data/rawTest

#3: Find the full usage at http://rdrpostagger.sourceforge.net !

I'm not sure where this blank error is coming from as it is blank. This problem does not occur for the java implementation however, so:

java RDRPOSTagger ../Models/UniPOS/UD_Latin/la-upos.RDR ../Models/UniPOS/UD_Latin/la-upos.DICT rawDataPath

works for the same file.

Alexander_Severus.txt

datquocnguyen commented 7 years ago

Thanks for your report, I am not sure where the error comes from because the file you attached is not tokenized. RDRPOSTagger requires an input tokenized/word-segmented file. Best, Dat.

matgrioni commented 7 years ago

Thank you for responding. I will try to tokenize the file as shown in /data as I had not noted this before in the requirements. I will close and re-open if the issue persists after that.

Stormur commented 5 years ago

I'm getting the same error:

=> Read a POS tagging model from /home/flavio/Documenti/POS/RDRPOSTagger/Models/UniPOS/UD_Latin-ITTB23/la_ittb23-upos.RDR

=> Read a lexicon from /home/flavio/Documenti/POS/RDRPOSTagger/Models/UniPOS/UD_Latin-ITTB23/la_ittb23-upos.DICT

=> Perform POS tagging on /home/flavio/Documenti/POS/Testi_Tabelle/De_divinatione/Cic_DeDiv_SentWord_Tokenized_corretto_detersum_orizzontale.txt

ERROR ==>  "''"

Probably there is an error in the file I used for training, since other models have no problem on the same file. But I can not identify it, since it seems to follow all requirements.

For training: latin_ittb-ud23_train_orizzontale.txt

To tag: Cic_DeDiv_SentWord_Tokenized_corretto_detersum_orizzontale.txt

datquocnguyen commented 5 years ago

You can either: 1) Fix this error by simply adding: '' PUNCT as a new line in the la_ittb23-upos.DICT file. 2) Or use the latest RDRPOSTagger which I have just updated. It is just a minor update on file InitialTagger.py to handle this error, so you do not need to retrain any model.

Stormur commented 5 years ago

Now it works, thank you!