-
### Search before asking
- [X] I have searched the Roboflow Notebooks [issues](https://github.com/roboflow-ai/notebooks/issues) and found no similar bug report.
### Notebook name
[Fine-tuning Florβ¦
-
Thank you for the incredible set of repositories (this one and prismatic-vlms), it has been a great joy using them. Very well-designed, configurable, and easy to use for researchers.
I'm running inβ¦
-
I am very glad that someone has finally realized that keeping a image resolution of 224 or 336 is not enough to build strong VLMs for complicated vision tasks such as detection/counting π
Do you hβ¦
-
Hello, I would like to ask how to test on M-Paper dataset? For example, for the task Multimodal Diagram Analysis, its input needs to be πΆπππ‘ππ₯π‘ + π·ππππππ π + ππ’π‘ππππ, and the question instructions, sβ¦
-
Hi! Thanks for your great work! I am curious about how to get multi-view object images in the "Object Caption" step of your annotation pipeline. It seems that only a 3D point cloud and object boundingβ¦
-
1. all datasets are downloaded
2. all requirements are installed
3. all dependency repos are prepared
-
Thanks for sharing your great work!
I have a few questions about your work, especially regarding the baselines.
1. Did you fine-tune the VLMs reported in Table 1? I got confused because Section 3.β¦
-
### Feature request
Allowing passing past key values during the forward pass of more than one token similar to the text large language models.
### Motivation
According to the documentation [here](β¦
-
### Feature request
It would be nice to get a standard `AutoModel` class for `image-text-to-text` models (since @molbap is standardizing the processor)
### Motivation
@NielsRogge noticed thaβ¦
-
I can see that there are multiple issues of the form "add X as a new OCR engine":
- #17
- #18
- #19
- #36
... therefore would it be sensible to document the steps and / or rearchitect such that tβ¦