Open slobentanzer opened 1 month ago
There have been reports of performance fluctuations in ChatGLM with respect to input language. https://www.nature.com/articles/d41586-024-01495-6
Given a fluent Chinese speaker, we could translate some of the BioChatter benchmark to Chinese, to evaluate the impact of language on the performance. We already have a similar approach in German, in our medical exam dataset (#157).
There have been reports of performance fluctuations in ChatGLM with respect to input language. https://www.nature.com/articles/d41586-024-01495-6
Given a fluent Chinese speaker, we could translate some of the BioChatter benchmark to Chinese, to evaluate the impact of language on the performance. We already have a similar approach in German, in our medical exam dataset (#157).