Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
https://vision-cair.github.io/Goldfish_website/
BSD 3-Clause "New" or "Revised" License
525 stars 57 forks source link

Improve discoverability + fix download stats on Hugging Face #31

Open NielsRogge opened 1 month ago

NielsRogge commented 1 month ago

Hi,

Niels here from the open-source team at Hugging Face. It's great to see you're releasing models + data on HF, I discovered your work through the paper page: https://huggingface.co/papers/2407.12679.

However there are a couple of things which could improve the discoverability of your models, and making sure the download stats work.

Dataset

The dataset itself could be linked to the paper, see here on how to do that: https://huggingface.co/docs/hub/en/datasets-cards#linking-a-paper

Download stats

I see that currently download stats aren't working for your models. This is due to the model repository containing various models, which do not contain a config.json file. See here for more info: https://huggingface.co/docs/hub/models-download-stats.

There are a few options here to make them work:

1) either we can open a PR on the huggingface.js repository to view "MiniGPT4-video" as a library, for which we add dedicated support. See this PR as an example: https://github.com/huggingface/huggingface.js/pull/784 2) one could leverage the PyTorchModelHubMixin to push model checkpoints to separate model repositories, each containing a config.json and safetensors file.

Usually we recommend to have a single repository per checkpoint.

Discoverability

Moreover, the discoverability of your models could be improved by adding tags to the model card here: https://huggingface.co/Vision-CAIR/MiniGPT4-Video, like "video-text-to-text" which helps people find the model when filtering hf.co/models.

Let me know if you need any help regarding this!

Cheers,

Niels ML Engineer @ HF 🤗

KerolosAtef commented 1 month ago

Hello @NielsRogge,

Thank you for your interest in our work.

I have linked the dataset to the paper for now and plan to update the dataset card later, as I'm currently quite busy.

As I am new to huggingface tools, could you assist me in integrating MiniGPT4_video so that it can be downloaded using the from_pretrained function and be included among the supported models by huggingface?

If this is what you mean by a PR to integrate MiniGPT4_video as a library, it would be greatly appreciated, and I will ensure it gets merged.

Please let me know if there's anything else I can do to help.

NielsRogge commented 1 month ago

Hi @KerolosAtef,

Thanks for linking the dataset to the paper and uploading the model.

I see that you pushed a commit to leverage the PyTorchModelHubMixin class. Now inference should work as follows:

from minigpt4.models.mini_gpt4_llama_v2 import MiniGPT4_llama_v2

model = MiniGPT4_llama_v2.from_pretrained("Vision-CAIR/MiniGPT4-Video")

Could you confirm that this works? In that case, we could update the model card to include this code snippet to showcase how to get started with the model, and update the demo code of the Space to also leverage from_pretrained.

Cheers!

KerolosAtef commented 1 month ago

Hello @NielsRogge , Yes, I tried to do that yesterday , but unfortunately it is not working , could you help me with that ,

from minigpt4.models.mini_gpt4_llama_v2 import MiniGPT4_llama_v2

model = MiniGPT4_llama_v2.from_pretrained("Vision-CAIR/MiniGPT4-Video")

image

 # by using AutoModel
from transformers import AutoModel

model = AutoModel.from_pretrained("Vision-CAIR/MiniGPT4-Video")

I got this error : image
I think I did something wrong while uploading the model , could you help me with that Thanks in advance!

NielsRogge commented 1 month ago

Hi,

So it depends on how you want the model to have integration with the hub. cc @wauplin

Option 1

If you want people to use the MiniGPT4-video code base with integration to the hub, then it's advised to do the following:

from minigpt4.models.mini_gpt4_llama_v2 import MiniGPT4_llama_v2

model = MiniGPT4_llama_v2(...)

# equip with weights
model.load_state_dict(...)

# push to the hub
model.push_to_hub("...")

then you should be able to reload it as:

from minigpt4.models.mini_gpt4_llama_v2 import MiniGPT4_llama_v2

model = MiniGPT4_llama_v2.from_pretrained("Vision-CAIR/MiniGPT4-Video")

Regarding the error that you have above, could you double check whether the config was properly serialized? https://huggingface.co/Vision-CAIR/MiniGPT4-Video/blob/main/config.json

It might be there's an issue there when attempting to re-instantiate the model.

Option 2

In case you want people to use your model through the Transformers library (and make it usable through one of the Auto Classes like AutoModel), then you can follow this guide: https://huggingface.co/docs/transformers/custom_models. It requires registering your model and pushing the code to the hub.

Let me know what you prefer!

KerolosAtef commented 1 month ago

Hello @NielsRogge , I fixed the bug with option 1 and now it works

from minigpt4.models.mini_gpt4_llama_v2 import MiniGPT4_Video ,minigpt4_video_config
model = MiniGPT4_Video.from_pretrained("Vision-CAIR/MiniGPT4-video-new")

I need to do option 2, but I got stuck after this step. I don't know how to publish the registration process.

from minigpt4.models.mini_gpt4_llama_v2 import MiniGPT4_Video ,minigpt4_video_config
from transformers import AutoConfig, AutoModel, AutoTokenizer
model = MiniGPT4_Video.from_pretrained("Vision-CAIR/MiniGPT4-video-new")
AutoConfig.register("minigpt4_video",minigpt4_video_config)
AutoModel.register(minigpt4_video_config,MiniGPT4_Video)

note that minigpt4_video is the model_type in the config.json and minigpt4_video_config is the configuration class.
I need to know how to push the code to the hub.