Hi, Thank you for the great work.
I am curious how the pre-training sample looks like across different languages. If possible please provide a sample dataset.
If you can point me to pre-processing (for pre-training) and pre-training scripts. It will be a great help.
Hi, Thank you for the great work. I am curious how the pre-training sample looks like across different languages. If possible please provide a sample dataset. If you can point me to pre-processing (for pre-training) and pre-training scripts. It will be a great help.