ByungKwanLee / MoAI

Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks. (Under Review)
MIT License
292 stars 25 forks source link

It is Amazing! #1

Open atazangene opened 3 months ago

atazangene commented 3 months ago

I just saw this model and I think it's really amazing; it's great work. In order to improve this model, I have some suggestions:

Compare model with the latest opened/closed models

I think it would be better to compare it with the latest models, such as Llava 1.6 and Qwen Max, which are currently the highest-end models available.

Release the 34b model to beat the competitors

If you have sufficient funds and resources, I think it would be beneficial to release the 34b model.

Create a video tutorial on how to fine-tune this model

One of the biggest problems with other models is that they only provide limited text documentation on how to fine-tune them. However, I believe that with a video tutorial, you can make a greater impact online and draw a lot of attention to this model.

ByungKwanLee commented 3 months ago

Thanks for interest in our work and thanks for your suggestions! MoAI-7B was compared with LLaVA1.6-13B and -34B in our paper's figure 6. We will compare MoAI with Qwen-Max and Gemini-Ultra, but our work mainly aims to make LLVMs get real-world scene understanding, so it is wondering whether comparing MoAI-7B and super large LLVMs is really fair comparison. We unfortunately do not have GPU rich resources and funds therefore 34B is impossible to release on the current state. Creating a video to explain the training steps in a simple manner sounds like a great idea. Thanks again!