fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
440 stars 35 forks source link

Release `Random` and `Filtered` parallel corpora #14

Closed zwhe99 closed 11 months ago

zwhe99 commented 11 months ago

Thanks for your work! I notice that you also conducted an ablation for parallel data in Table 3, where you used Random and Filtered variants for parallel data. Can you release these data so that we can better reproduce the results?

fe1ixxu commented 11 months ago

They are Microsoft's internal datasets. I will discuss with my colleagues the best way to release them. Thank you!

zwhe99 commented 11 months ago

Thank you for your reply.