SiyuanHuang95 / ManipVQA

[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
72 stars 3 forks source link

Publish trained model of this project with smaller size #6

Closed nacui-intel closed 3 months ago

nacui-intel commented 4 months ago

The trained model published in hf of this project is still too large for client node (limited memory) to do inference. Can you share a smaller model, e.g. 7B size, or even smaller for users? Thanks.

SiyuanHuang95 commented 4 months ago

Thanks for your interest in our project.

7B is now on our plan list! :-)

Would like to know do you wanna the pure ManipVQA version or the advanced version where the model combines the ability of ManipVQA and A3VLM.

nacui-intel commented 4 months ago

Thanks for your interest in our project.

7B is now on our plan list! :-)

Would like to know do you wanna the pure ManipVQA version or the advanced version where the model combines the ability of ManipVQA and A3VLM.

It would be much nice if you can provide both. Then it is more flexible for the deployment of different user cases. We can also evaluate the performance difference between them. Thanks a lot.

SiyuanHuang95 commented 4 months ago

Okay, the pure ManipVQA version is on-the-way, will provide with single-GPU inference demo in some days

SiyuanHuang95 commented 3 months ago

@nacui-intel 7B model is now published, please check the updates in the README.

When any problem, please let me know.