baaivision / Emu3

Next-Token Prediction is All You Need
Apache License 2.0
1.81k stars 71 forks source link

Details Regarding Post-Training #29

Open Doctor-James opened 3 weeks ago

Doctor-James commented 3 weeks ago

How is the post-training for the two tasks of multimodal understanding and image generation conducted? Is it done jointly like in Show-O, or are they trained separately? Also, what are the approximate total number of training samples and the ratio between the two tasks?