Open asenasen123 opened 10 months ago
Sure if you want to finetune you can follow some of what is outlined in this issue: https://github.com/Muennighoff/sgpt/issues/2
For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough
Sure if you want to finetune you can follow some of what is outlined in this issue: #2
For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough
Do many spgt models on Huggingface support Chinese?
If I want to fine-tune the sgpt model, do I just change the dataset?
I think only the bloom ones perform well for Chinese. Yes you can just change the dataset.
I think only the bloom ones perform well for Chinese. Yes you can just change the dataset.
Which Chinese dataset should I evaluate the fine-tuned model on?
I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard
Also see https://github.com/embeddings-benchmark/mteb/pull/134
I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard
Also see embeddings-benchmark/mteb#134
Are evaluation indicators also Pearson and Spearman?
I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard Also see embeddings-benchmark/mteb#134
Are evaluation indicators also Pearson and Spearman?
For retrieval datasets its nDCG@10 ; But don't worry about the evaluation - if you use MTEB it takes care of automatically calculating the scores etc.
I would evaluate on the Chinese datasets in MTEB. If you train a Retrieval model, you can try the Chinese Retrieval datasets from C-MTEB: https://huggingface.co/spaces/mteb/leaderboard Also see embeddings-benchmark/mteb#134
Are evaluation indicators also Pearson and Spearman?
For retrieval datasets its nDCG@10 ; But don't worry about the evaluation - if you use MTEB it takes care of automatically calculating the scores etc.
Thank you very much!
Sure if you want to finetune you can follow some of what is outlined in this issue: #2
For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough
what about spanish fine tune?
Sure if you want to finetune you can follow some of what is outlined in this issue: #2 For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has seen lots of Chinese during pretraining & might be good enough
what about spanish fine tune?
Sure you can do that too. https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco has also seen a lot of Spanish so it may work well for you.
Could you please tell me how i can fine tune for my custom Chinese datasets?