centre-for-humanities-computing / greevaluation

Evaluation workflows for ancient greek language models
MIT License
2 stars 0 forks source link

CLTK tokens don't match #2

Closed x-tabdeveloping closed 1 year ago

x-tabdeveloping commented 1 year ago

CLTK already has errors at the tokenization stage, and spaCy's evaluation scripts only work if the tokens in both documents are the same. Could try removing punctuation from both documents, but success is still not guaranteed. :(

x-tabdeveloping commented 1 year ago

It hath been fixeth