baaivision / Emu3

Next-Token Prediction is All You Need
Apache License 2.0
1.81k stars 71 forks source link

JSON format in pre-training stage for Emu3 #34

Open leofan90 opened 3 weeks ago

leofan90 commented 3 weeks ago

Hello! Thanks for your fantastic open-source work! It's exciting. I have a few questions about constructing our data format for fine-tuning. Can you describe the JSON format of your pre-trained and fine-tuned dataset? It would be great if you could demonstrate a demo of your pre-trained data format and dataset preprocessing code. Thanks for your time and looking forward for you reply!