X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

Video checkpoint available? #95

Closed nullnameno closed 1 year ago

nullnameno commented 1 year ago

Hello, i notice that 'video checkpoints are available on HuggingFace', but I couldn't find it. May I ask if you have encountered any problems? When is it available?

LukeForeverYoung commented 1 year ago

Our server is currently experiencing connectivity issues with HuggingFace, so we're unable to upload the checkpoint to the HF Hub for the time being. We'll upload it as soon as the issues are resolved.

nullnameno commented 1 year ago

Thanks! When will it be uploaded to the ModelScope?

LinB203 commented 1 year ago

Our server is currently experiencing connectivity issues with HuggingFace, so we're unable to upload the checkpoint to the HF Hub for the time being. We'll upload it as soon as the issues are resolved.

If I run the video code, it will return error as follows. It seem that there is no checkpoint in HuggingFace repo. I wonder know that when will checkpoint be released? Thanks! OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like MAGAer13/mplug-owl-llama-7b-video is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

MAGAer13 commented 1 year ago

Yes, we have not uploaded our checkpoint on HF. We will do this in this week.

ceyxasm commented 1 year ago

Does this also affect the local video inference section @MAGAer13 https://github.com/x-plug/mplug-owl#video-inference

MAGAer13 commented 1 year ago

Does this also affect the local video inference section @MAGAer13 https://github.com/x-plug/mplug-owl#video-inference

This means you cannot load the weight, but you can run it without checkpoint, but not recommend.

ceyxasm commented 1 year ago

OSError: It looks like the config file at 'MAGAer13/mplug-owl-llama-7b-video' is not a valid JSON file. Okay so. My error looks like this. Any leads

MAGAer13 commented 1 year ago

OSError: It looks like the config file at 'MAGAer13/mplug-owl-llama-7b-video' is not a valid JSON file. Okay so. My error looks like this. Any leads

You can use the config from mplug-owl-llama-7b, the arch is similar. We will upload it tomorrow (I hope so).

LinB203 commented 1 year ago

Yes, we have not uploaded our checkpoint on HF. We will do this in this week.

btw, could you supply a demo code of batched text-video inference? It seems that there is a demo code for batch_size=1 now?

MAGAer13 commented 1 year ago

We did not test for batch inference. You can use the demo on HF for video inference.

LinB203 commented 1 year ago

We did not test for batch inference. You can use the demo on HF for video inference.

So if I want to test many sample, I must use a for-loop with batch_size=1? Like this? for i in range(100000): generated_text = model.forward(one_text, one_video)

MAGAer13 commented 1 year ago

We did not test for batch inference. You can use the demo on HF for video inference.

So if I want to test many sample, I must use a for-loop with batch_size=1? Like this? for i in range(100000): generated_text = model.forward(one_text, one_video)

Multi-sample inference is not ensured. But for training, it supports batch training. Since the generation procedure is different compared to training.

MAGAer13 commented 1 year ago

Hi all, the video checkpoint has been released.

shaswati1 commented 10 months ago

Hi all, the video checkpoint has been released.

@MAGAer13 , where can I find the video checkpoint (because the one in HF doesn't work)? I could do the inference according to the video inference instructions in this link. However, I want to fine-tune this model on my dataset which includes video and text and I found the training pipeline a bit confusing. Can you please give a pointer on that (e.g., which file should I look for)?

MAGAer13 commented 10 months ago

The video checkpoint is here (It should work now), and if it is not work, see #101 for the checkpoint. Then you change the model in training pipeline by replacing import mplug_owl with import mplug_owl_video which replace the image model architecture with video model architecture.

shaswati1 commented 10 months ago

@MAGAer13, which file should I look for the training pipeline itself, this one?

MAGAer13 commented 10 months ago

@MAGAer13, which file should I look for the training pipeline itself?

See train_it.sh and ./pipeline/train.py