Open sz2three opened 1 year ago
You can check Section 4.4 of the MTEB paper (https://arxiv.org/pdf/2210.07316.pdf) where https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco is benchmarked on many languages incl. Korean & Japanese against other models. As it hasn't extensively seen them in pre-training it performs rather poorly on them.
You may want to use a different model for those languages (check e.g. this leaderboard to see what's best: https://huggingface.co/spaces/mteb/leaderboard for those languages).
it supports Chinese, while does it also work for Korea and Japanese?