aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
153 stars 40 forks source link

Chinese model and content support #202

Closed weinick closed 4 months ago

weinick commented 4 months ago

Hi experts, Does fmeval support Chinese QA evaluation and LLM model like ChatGLM and Baichuan which is deployed in SageMaker endpoint? Thanks.

danielezhu commented 4 months ago

Hi, you should be able to invoke any SageMaker-deployed endpoint using our SageMakerModelRunner.

Currently, you can use our QA Accuracy evaluation algorithm with Chinese text, but the metrics that are computed may not match what you expect, mainly because Chinese text tends not to have spaces in it, like English does.

For example if the model_output is "学中文" and the target_output is "中文", the _f1_score function in our QA Accuracy code will output 0.0. This is because _f1_score (and other functions, like _precision) internally call _split, which splits text on whitespace. Since there is no whitespace in either of these pieces of text, _split("学中文") will be the set {"学中文"}, while _split("中文") will be the set {"中文"}. There is no overlap between these sets at all, so the computed metric is 0.0.