[IDEA] RedPajama-Data-1T

Hi @svupper,

Meta's LLaMA model has been trained on a massive amount of data - 1.0T/1.4T tokens on 2048 A100s (80GB) over a period of 5 months. Continuing the pre-training of the LLaMA model on a French corpus is definitely a promising approach to improve its performance on the French language. However, this option is still quite expensive and may require significant computational resources. I'm currently pre-training it on a small French dataset to see if it improves a lot. Stay tuned!

bofenghuang / vigogne

[IDEA] RedPajama-Data-1T #12