Closed luminitavoicu closed 4 years ago
I think their conversion script has a few minor bugs. Here's my updated version (that also shuffles the sentences, which you may not want):
In good news, we'll have Romanian models with vectors trained on RONEC available for spacy v2.3.0 soon!
Hi, thank you for the quick reply!
Good news: I used the updated script you provided and I noticed that the unnecessary "*" in the train file was no longer an issue. Moreover, after runing the spacy converter on the train collu file with the command python -m spacy convert train_ronec.conllu . --converter conllu
, the ner tags appeared in the train json as well (before they were missing), so this sounds like progress to me.
Unfortunately, the model still gets stuck during training.
Try running spacy debug-data
on the data to see if there are any errors or warnings (add -p ner
to get it to skip the tagger/parser analysis)?
I posted a similar issue on the RONEC repository as well: https://github.com/dumitrescustefan/ronec/issues/2 because I wasn't sure if this was a spacy problem or if there was a problem with the RONEC conversion script.
Fortunately, they updated their script and all the problems are now gone. Apparently, spacy modified the converter and the compatibility with the script was affected.
Thank you for all the help!
Glad to hear it's working!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hello,
I attempted to use the RONEC corpus with Spacy for NER and I encountered some problems while following the tutorial for using Spacy in the RONEC project: https://github.com/dumitrescustefan/ronec/tree/master/spacy
Firstly, I cloned the repository and I tried to obtain the .json train and dev files using the
convert_conllubio.py
script and Spacy's convert tool as shown in the tutorial:!python3 ronec/spacy/train-local-model/convert_conllubio.py ronec/ronec/conllup/raw/ronec.conllup .
!python -m spacy convert train_ronec.conllubio . --converter conllubio
When I ran the second command, for the train data set I got this error:
When I looked at the
train_ronec.conllubio
file, I noticed that there were 11 columns on the first line instead of 10, as shown below:I found that deleting the "*" on the first line solved this problem, but I couldn't really understand why this happened.
I moved on with the tutorial and I attempted to train the open-source BILSTM-CNN model found here: https://github.com/kamalkraj/Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs with Spacy's train tool, using this command:
!python3 -m spacy train ro Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs/models/ train_ronec.json dev_ronec.json -p ner
I noticed a very strange behaviour for this: the model got stuck at 36%, no matter how much time I let it run. This is the output I got:
Since it did not return any errors, I am not sure how to debug it, or if I am using it right.
Environment
I am running this on Google Colab. Here is some information about the environment: