Open liziming5353 opened 3 months ago
Why does the data in stage2 and 3 contains pure text Q&A without images or videos?
According to DeepSeek-VL,
Maintaining a significant proportion of language data—specifically, at least 70%—is essential to preserve the integrity of language knowledge within the model.
Why does the data in stage2 and 3 contains pure text Q&A without images or videos?