centre-for-humanities-computing / greevaluation

Evaluation workflows for ancient greek language models
MIT License
2 stars 0 forks source link

Udpipe eval #7

Closed jankounchained closed 1 year ago

jankounchained commented 1 year ago

Problem: I'm running into the same issue you used to see with cltk: reference tokens not matching predicted tokens.

ValueError: [E949] Unable to align tokens for the predicted and reference docs. It is only possible to align the docs when both texts are the same except for whitespace and capitalization. The predicted tokens start with: ['ζῶσι', 'δὲ', 'καὶ', 'οὗτοι', 'τὸν', 'αὐτὸν', 'τρόπον', 'τοῖς', 'θρεψαμένοις', ',']. The reference tokens start with: ['ζῶσι', 'δὲ', 'καὶ', 'οὗτοι', 'τὸν', 'αὐτὸν', 'τρόπον', 'τοῖς', 'θρεψαμένοις', ','].