Closed itzsimpl closed 2 years ago
This is an issue related to obeliks 1.0.6, that was resolved in a later version. Unfortunately, classla 1.0.2 has issues with later versions of obeliks. To solve this, you can either wait for upcoming classla release or use one of the following workarounds with current version:
import classla
text = "..."
nlp = classla.Pipeline(lang='sl',processors='tokenize,pos', tokenize_library='reldi')
nlp(text)
import obeliks
import classla
text = "..."
pretokenized_text = obeliks.run(text, conllu=True)
nlp = classla.Pipeline(lang='sl',processors='tokenize,pos', tokenize_pretokenized='conllu')
nlp(pretokenized_text)
New release of classla (v1.1.0) now works with latest obeliks.
Classla 1.0.2, with obeliks 1.0.6 hangs up on really long words. Example: