Closed qwopqwop200 closed 9 months ago
Thank you for the interest!
If you want ALMA to support languages beyond German, Chinese, Czech, Russian and Icelandic (what ALMA originally supported), The best way is to firstly fine-tune the monolingual data on your target language. If your target language is one of them, a smaller dataset like 2K CPO data should be totally fine.
Thank you for the interest!
If you want ALMA to support languages beyond German, Chinese, Czech, Russian and Icelandic (what ALMA originally supported), The best way is to firstly fine-tune the monolingual data on your target language. If your target language is one yof them, a smaller dataset like 2K CPO data should be totally fine.
I already finished the training of ALMA based on ko-solar 10.7b and now I just need to fine tune it with CPO data. Would 2k be sufficient in such a case?
Yes, it should be sufficient. But please be careful of the quality of the CPO data.
Thank you for your amazing work.
I am thinking of creating a bidirectional translator using ALMA-R that supports only single pairs. How many CPO datasets do you expect to need for this?
Do I need 22k datasets like in ALMA-R? Or is a smaller number of data sufficient?