Closed snthibaud closed 3 years ago
Hmm, I'm not entirely sure at this point, but this kind of error can indicate that there's not enough (usable) training data. My first guess would be that your NER annotation might not align well with the tokenization from the JapaneseTokenizer
, which uses sudachipy
.
What is the output of the NER section for spacy debug data -V config.cfg
(with the corpus path options as necessary)?
This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.
I was facing similar issue, here is what I did -- as suggested by @adrianeboyd , a quick run of
spacy debug data -V /content/config.cfg --paths.train /content/train.spacy --paths.dev /content/eval.spacy
showcased several warnings and 1 error.
Something like this:
In my case, the error was due to 1 entity having trailing spaces. A simple .strip()
on the entity resolved the issue.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
How to reproduce the behaviour
I was trying to train a NER with the following config:
Then I encountered the following stacktrace:
The number of documents could be a bit high (~500.000).
Info about spaCy