Open lucasjinreal opened 5 months ago
I also changed llm to qwen1.5, and the performance will be somewhat improved.
Feel that our directions are very similar, if interested, you can leave the contact information to communicate.
Allava had a Chinese version. What do u mean by deepseek's hybrid? Minigemini already a hybrid arch. interlm-xcomposer data could be even more dirty. the sharegpt4v dataset should already be included in.
There is a Chinese version of Allava, but both the Chinese and English versions are dirty. In the Chinese version of allava, there are many phenomena of picture-text mismatch, translation dislocation and translation hallucination. For example, grep “宁静湖畔” in allava-cn , the result is a high probability of picture and text mismatch. Therefore, it is necessary to clean allava-en and allava-cn, and the addition of allava-cn can also bring about the improvement of indicators.
Minigemini has a mixed structure, but after the experiment, deepseek-vl will be slightly better.
Interlm-xcomposer data, I specifically refer to the sft phase, such as aokvqa, okvqa, lvis data
How did u clean allava data and manually translate to Chinese version? Would share the data after shared? that would be very nice. Also, does internxcomposer opensourced their sft data?
Hi, I have conducted experiment minigemini arch to Qwen series model, it has a good performance.
However, the performance didn't strong enough compare to some SOTA small models such as MiniCPMv2 LLavaUHD etc.
Which used a very large input and slicing technology.
As such, am just wonder, how can we further pushing the boundry of mini-gemini, and make mini-gemini great again?
The currently baseline I got from qwen7b is slightly same as gemma7b's on MMMU, but this is actually not very satisfying.
Here are some thoughts to further improve on my mind:
So here is I want talk about: How should we exactly make some improvement?
Hoping for your discussion and insights, guid me on the right path.