fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
352 stars 26 forks source link

About the interleave probability selections #4

Closed Aniruddha-JU closed 9 months ago

Aniruddha-JU commented 9 months ago

Congrats on the great work and thanks for sharing the nice Github repo!

I have one question how do you decide the interleave probability percentage? Do you follow any rules or previous work?

Aniruddha-JU commented 9 months ago

I am asking for the stage 1 pre-training

fe1ixxu commented 9 months ago

Thanks for your interest!! Please find the reasons for interleave probability selection for stage 1 in Appendix D.1 in the paper!