amazon-science / mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
https://arxiv.org/abs/2302.00923
Apache License 2.0
3.77k stars 309 forks source link

Request for Release of Multimodal-CoT Large 738M Model #65

Open Amyyyyeah opened 10 months ago

Amyyyyeah commented 10 months ago

I've recently come across your paper detailing the impressive capabilities of the Multimodal-CoT Large 738M model, particularly its performance across various metrics (95.91, 82.00, 90.82, 95.26, 88.80, 92.89, 92.44, 90.31, and 91.68).

I am writing to inquire about the possibility of its public release because we have noted that the GitHub version, which shows a performance score of 90.45, differs from the one reported in your paper (91.68 performance score). Access to this model could significantly aid in ongoing research and development efforts in our field.

Thank you for your time and your contributions to the field. I look forward to your response and the opportunity to work with this innovative model.

dingning97 commented 8 months ago

Hi. Can you reproduce the 91.68% accuracy using T5-large model ? I tried to reproduce the experiments with "declare-lab/flan-alpaca-large" model, but only got ~90.5% accuracy for the test set of ScienceQA.

1-sf commented 8 months ago

Hi @dingning97 and @Amyyyyeah , I too get a similar avg accuracy 90.45%. I see https://huggingface.co/cooelf/mm-cot/tree/main also has a similar accuracy which is lower than the one discussed in the paper.

First of all thanks for the authors for such an innovative idea, it'll be great if the authors can release the model weights which will be very beneficial for people like us

@cooelf @astonzhang

cooelf commented 4 months ago

Hi guys, thanks for your interest. The released models are my reproduced ones using a limited computation resource after my internship finishes. It is possible to obtain better results with more hyper-parameter searching.

BTW, we are inspired by an increase in the base model compared with the original one. We will update the paper with the latest results based on our released models for consistence.