-
Thank you for the great work on Multimodal Chain of Thought and for open-sourcing the code! The results are really impressive. I was wondering if there is any colab notebook or example script to try t…
-
ScienceQA was evaluated in your experiments. As I understand, ScienceQA is a benchmark associated with multi-modal tasks, whereas your model operates purely within the realm of text. Could you please …
-
Dear authors,
Thanks for your exciting and solid work.
May I ask why Multimodal Chain-of-Thought is still significantly better than UnifiedQA when there is no visual input (e.g, the text context…
-
### Feature request
We want to be able to give the model the ability to:
1. paint a red dot on its suggested target location
2. look at the screenshot with the dot on it,
3. optionally self cor…
-
> [!TIP]
> ## Want to get involved?
> We'd love it if you did! Please get in contact with the people assigned to this issue, or leave a comment. See general contributing advice [here](https://micros…
-
### Discussed in https://github.com/awslabs/autogluon/discussions/2216
Originally posted by **dcapeluto** October 15, 2022
There are examples of autogluon leveraging rapidsai. When installing …
-
This issue is for the notification of papers which will be added to this repo in the future
-
Hi there,
I'm very interested in your project and the dataset you used.
Could you please provide access to the original dataset? I would greatly appreciate it and will ensure to credit your work…
-
- [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890)
- [MEDITRON-70B: Scaling Medical Pretraining for Large Language Models](http…
-
* [Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models](https://arxiv.org/abs/2406.09403)
Humans draw to facilitate reasoning: we draw auxiliary lines when solving…