LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.39k stars 165 forks source link

Missing mm_projector in latest LLaVa #145

Open mrd opened 1 month ago

mrd commented 1 month ago

I am attempting to run the finetune_onevision.sh script. I've gotten many things sorted out but I am stumped by the --pretrain_mm_mlp_adapter argument.

The default value as provided in the script is ./checkpoints/projectors/llavanext-openai_clip-vit-large-patch14-336-Qwen_Qwen2-7B-Instruct-mlp2x_gelu-pretrain_blip558k_plain/mm_projector.bin after expanding the environment variables. I made sure that directory exists but I do not know where to find mm_projector.bin for the newest LLaVa. I have found an issue and discussion regarding this parameter for the previous version of LLaVa, e.g. https://huggingface.co/liuhaotian/llava-v1.5-13b/blob/main/mm_projector.bin

I have also looked for some kind of extract_projector script but that does not seem to exist.

This seems to be something rather important and I cannot find any documentation about it all, apart from the aforementioned Github issues for LLaVa 1.5, even after scouring the web with Google and DuckDuckGo.

I am currently attempting to use the mm_projector.bin downloaded from the link above, from the LLaVa 1.5 liuhaotian archive. Update: this has resulted in a series of size/shape mismatch type errors (not surprisingly, really), e.g. size mismatch for 0.weight: copying a param with shape torch.Size([5120, 1024]) from checkpoint, the shape in current model is torch.Size([3584, 1152]).

Please advise.

Asunatan commented 1 month ago

+1

Luodian commented 1 month ago

Thanks for your question. We uploaded them at here~

https://huggingface.co/lmms-lab/llava-onevision-projectors/tree/main

mrd commented 1 month ago

Great, it gets a little further. One quick question: what are the expected CUDA mem requirements for fine-tuning? I am trying the 7b model on 4x A100 with 40GB RAM each and it runs out of memory. But should that be sufficient and should I look at tweaking some parameters, or should I try to distribute to even more GPUs because it is not enough at all?

(note: I am able to get the Qwen2-0.5b-based model up and fine-tuning using this set-up - script attached below with example parameters filled out - was able to successfully run using 1x A100 (used about 24GB RAM in my case), though also tried 2x and 4x; all ran within a few minutes for MAX_STEPS=25)

finetune_onevision.sh.gz

Luodian commented 1 month ago

yes, I think at least you need 8*80G to run 7B finetuning.

SrikanthChellappa commented 4 weeks ago

can you guys help in uploading pre-trained adapter file mm_projector.bin for llama-3.1-8B model ? or if available can one of you share the path pls?

mrd commented 3 weeks ago

Note that the output of my finetuned 0.5B model is garbage and I haven't been able to find a solution yet. But perhaps this should now be rolled into #155, which seems to be a similar issue with finetuned models.

zixianma-sf commented 3 weeks ago

Hi, related to this issue, I'm having issues loading the pretrained mm project from here. The error occurred on line 108 of the LLaVA-NeXT/llava/model/llava_arch.py file: mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location="cpu") which triggered:

File "...python3.10/site-packages/torch/serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)                                             
_pickle.UnpicklingError: invalid load key, 'v'.

I double checked that I have the correct Python (3.10) and torch (2.1.2) versions in my environment. What could be the issue?

Would appreciate any help/pointers -- Thanks!

guoyanan1g commented 3 weeks ago

Hi, related to this issue, I'm having issues loading the pretrained mm project from here. The error occurred on line 108 of the LLaVA-NeXT/llava/model/llava_arch.py file: mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location="cpu") which triggered:

File "...python3.10/site-packages/torch/serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)                                             
_pickle.UnpicklingError: invalid load key, 'v'.

I double checked that I have the correct Python (3.10) and torch (2.1.2) versions in my environment. What could be the issue?

Would appreciate any help/pointers -- Thanks!

I alse meet some error when this line ends: KeyError: "filename 'storages' not found"

zixianma-sf commented 3 weeks ago

Hi, related to this issue, I'm having issues loading the pretrained mm project from here. The error occurred on line 108 of the LLaVA-NeXT/llava/model/llava_arch.py file: mm_projector_weights = torch.load(pretrain_mm_mlp_adapter, map_location="cpu") which triggered:

File "...python3.10/site-packages/torch/serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)                                             
_pickle.UnpicklingError: invalid load key, 'v'.

I double checked that I have the correct Python (3.10) and torch (2.1.2) versions in my environment. What could be the issue?

Would appreciate any help/pointers -- Thanks!

issue resolve -- it turned out it was because the mm_projector.bin that got downloaded through hf transformers is just a pointer, and downloading the actual file resolved it.