Closed markoferme closed 2 years ago
This problem arises because classla version 1.0.2 is not adapted for obeliks tokenizer at or above version 1.1.0.
Solution: Downgrade obeliks library to version 1.0.6:
pip uninstall obeliks
pip install obeliks==1.0.6
The upcoming release of classla will support latest obeliks version as well.
When trying to use classla on texts, that are generated from conversion of other formats (pdf, docx,...), an error is thrown:
The error occurs, when there are no words in a sentence, returned by the tokenizer (obeliks in my case).
Steps to reproduce the behavior:
A fix would probably be, to check if the tokenizer is returning any words at all, and ignore such a sentence.