Rouge score - Githubissues

bigscience-workshop / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

MIT License

101 stars 30 forks source link

I probably would not recommend it for Spanish or any other "normal" spaced lang in the current state. The default tokenizer used in rouge_scorer replaces non-alphanumeric chars (English) with spaces, so, for example, the text "Cristóbal está ayudando a su Abuela" would be converted to "Cristbal est ayudando a su Abuela"; removing the ó and á. See the tokenize definition here: https://github.com/google-research/google-research/blob/0aa035ff363066089612fb37e3e137a71cadb9c0/rouge/tokenize.py#L50-L61 Though, if you could loosen the non_alpha_numeric pattern to ignore accented letters etc. it should be fine.

bigscience-workshop / lm-evaluation-harness

Rouge score #157