Closed Luo-Z13 closed 9 months ago
The tiny-llava-v1-hf is our legacy model and it is compatible with native huggingface, as the weights have been converted to the hf implementation. If you want to load the legacy model, you should check out our model card.
To use TinyLLaVA-3.1B, we have updated our readme file. The warnings can be ignored as they do not affect performance (they are hf integration warings).
The tiny-llava-v1-hf is our legacy model and it is compatible with native huggingface, as the weights have been converted to the hf implementation. If you want to load the legacy model, you should check out our model card.
To use TinyLLaVA-3.1B, we have updated our readme file. The warnings can be ignored as they do not affect performance (they are hf integration warings).
Thank you very much! However, I encountered the following error during the inference. Do I need to compile the environment in TinyLLAVA?
[2024-02-24 12:46:22,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You are using a model of type siglip_vision_model to instantiate a model of type clip_vision_model. This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
File "/media/dell/data1/TinyLLaVABench/inference_tiny_llava.py", line 133, in <module>
tokenizer, model, image_processor, context_len = load_pretrained_model(
File "/media/dell/data1/TinyLLaVABench/tinyllava/model/builder.py", line 127, in load_pretrained_model
model = TinyLlavaPhiForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3594, in from_pretrained
no_split_modules = model._get_no_split_modules(device_map)
File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1690, in _get_no_split_modules
raise ValueError(
ValueError: TinyLlavaPhiForCausalLM does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute.
The tiny-llava-v1-hf is our legacy model and it is compatible with native huggingface, as the weights have been converted to the hf implementation. If you want to load the legacy model, you should check out our model card. To use TinyLLaVA-3.1B, we have updated our readme file. The warnings can be ignored as they do not affect performance (they are hf integration warings).
Thank you very much! However, I encountered the following error during the inference. Do I need to compile the environment in TinyLLAVA?
[2024-02-24 12:46:22,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. You are using a model of type siglip_vision_model to instantiate a model of type clip_vision_model. This is not supported for all configurations of models and can yield errors. Traceback (most recent call last): File "/media/dell/data1/TinyLLaVABench/inference_tiny_llava.py", line 133, in <module> tokenizer, model, image_processor, context_len = load_pretrained_model( File "/media/dell/data1/TinyLLaVABench/tinyllava/model/builder.py", line 127, in load_pretrained_model model = TinyLlavaPhiForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs) File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3594, in from_pretrained no_split_modules = model._get_no_split_modules(device_map) File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1690, in _get_no_split_modules raise ValueError( ValueError: TinyLlavaPhiForCausalLM does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute.
Sorry for the delay. We have updated on how to install relevant enviroments and packages here.❤️
compile
Thanks for your timely reply. Another new error:
- This IS expected if you are initializing TinyLlavaPhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TinyLlavaPhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
File "/media/dell/data1/TinyLLaVABench/predict_RS_img_background_generation_tiny_llava.py", line 133, in <module>
tokenizer, model, image_processor, context_len = load_pretrained_model(
File "/media/dell/data1/TinyLLaVABench/tinyllava/model/builder.py", line 145, in load_pretrained_model
vision_tower.load_model()
File "/media/dell/data1/TinyLLaVABench/tinyllava/model/multimodal_encoder/clip_encoder.py", line 25, in load_model
self.image_processor = CLIPImageProcessor.from_pretrained(self.vision_tower_name)
File "/media/dell/data1/miniconda3/envs/glamm/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 206, in from_pretrained
image_processor_dict, kwargs = cls.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
File "/media/dell/data1/miniconda3/envs/glamm/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 335, in get_image_processor_dict
resolved_image_processor_file = cached_file(
File "/media/dell/data1/miniconda3/envs/glamm/lib/python3.10/site-packages/transformers/utils/hub.py", line 356, in cached_file
raise EnvironmentError(
OSError: /media/dell/data1/pretrain_weights/SigLIP does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co//media/dell/data1/pretrain_weights/SigLIP/main' for available files.
It seems the preprocessor_config.json is lacking in https://huggingface.co/bczhou/TinyLLaVA-3.1B-SigLIP/tree/main.
The tiny-llava-v1-hf is our legacy model and it is compatible with native huggingface, as the weights have been converted to the hf implementation. If you want to load the legacy model, you should check out our model card. To use TinyLLaVA-3.1B, we have updated our readme file. The warnings can be ignored as they do not affect performance (they are hf integration warings).
Thank you very much! However, I encountered the following error during the inference. Do I need to compile the environment in TinyLLAVA?
[2024-02-24 12:46:22,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. You are using a model of type siglip_vision_model to instantiate a model of type clip_vision_model. This is not supported for all configurations of models and can yield errors. Traceback (most recent call last): File "/media/dell/data1/TinyLLaVABench/inference_tiny_llava.py", line 133, in <module> tokenizer, model, image_processor, context_len = load_pretrained_model( File "/media/dell/data1/TinyLLaVABench/tinyllava/model/builder.py", line 127, in load_pretrained_model model = TinyLlavaPhiForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs) File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3594, in from_pretrained no_split_modules = model._get_no_split_modules(device_map) File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1690, in _get_no_split_modules raise ValueError( ValueError: TinyLlavaPhiForCausalLM does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute.
Sorry for the delay. We have updated on how to install relevant enviroments and packages here.❤️
I suggest that you could post a link about SigLIP in your repo just like https://github.com/BAAI-DCAI/Bunny/blob/main/README.md#support-models
to make it clearer. Thank you again for your promptness and patience.
The tiny-llava-v1-hf is our legacy model and it is compatible with native huggingface, as the weights have been converted to the hf implementation. If you want to load the legacy model, you should check out our model card. To use TinyLLaVA-3.1B, we have updated our readme file. The warnings can be ignored as they do not affect performance (they are hf integration warings).
Thank you very much! However, I encountered the following error during the inference. Do I need to compile the environment in TinyLLAVA?
[2024-02-24 12:46:22,640] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. You are using a model of type siglip_vision_model to instantiate a model of type clip_vision_model. This is not supported for all configurations of models and can yield errors. Traceback (most recent call last): File "/media/dell/data1/TinyLLaVABench/inference_tiny_llava.py", line 133, in <module> tokenizer, model, image_processor, context_len = load_pretrained_model( File "/media/dell/data1/TinyLLaVABench/tinyllava/model/builder.py", line 127, in load_pretrained_model model = TinyLlavaPhiForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs) File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3594, in from_pretrained no_split_modules = model._get_no_split_modules(device_map) File "/media/dell/data1/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1690, in _get_no_split_modules raise ValueError( ValueError: TinyLlavaPhiForCausalLM does not support `device_map='auto'`. To implement support, the model class needs to implement the `_no_split_modules` attribute.
It seems that the
build_vision_tower
function fromtinyllava/model/multimodal_encoder/builder.py
identifies the weight you provided as CLIPVisionTower and it is causing error. Try renaming the weights to ".../siglip", and it should be fixed.
compile
I recompile the environment under TinyLLAVA, and change the path of CLIPVisionTower from https://huggingface.co/bczhou/TinyLLaVA-3.1B-SigLIP
to https://huggingface.co/google/siglip-so400m-patch14-384
, then the inference is OK :smile:.
compile
I recompile the environment under TinyLLAVA, and change the path of CLIPVisionTower from
https://huggingface.co/bczhou/TinyLLaVA-3.1B-SigLIP
tohttps://huggingface.co/google/siglip-so400m-patch14-384
, then the inference is OK 😄.
Could I directly use the https://huggingface.co/google/siglip-so400m-patch14-384
? It seems that the visionCLIP part has been fine-tuned in the paper, but I can still get results that appear to be correct. @baichuanzhou
compile
I recompile the environment under TinyLLAVA, and change the path of CLIPVisionTower from
https://huggingface.co/bczhou/TinyLLaVA-3.1B-SigLIP
tohttps://huggingface.co/google/siglip-so400m-patch14-384
, then the inference is OK 😄.Could I directly use the
https://huggingface.co/google/siglip-so400m-patch14-384
? It seems that the visionCLIP part has been fine-tuned in the paper, but I can still get results that appear to be correct. @baichuanzhou
We found that it was the builder function from tinyllava/model/multimodal_encoder/builder.py
that caused your error and fixed it. Our uploaded vision model was finetuned by us, and different from the google's version. So to get the results from the paper, you should use ours.
compile
I recompile the environment under TinyLLAVA, and change the path of CLIPVisionTower from
https://huggingface.co/bczhou/TinyLLaVA-3.1B-SigLIP
tohttps://huggingface.co/google/siglip-so400m-patch14-384
, then the inference is OK 😄.Could I directly use the
https://huggingface.co/google/siglip-so400m-patch14-384
? It seems that the visionCLIP part has been fine-tuned in the paper, but I can still get results that appear to be correct. @baichuanzhouWe found that it was the builder function from
tinyllava/model/multimodal_encoder/builder.py
that caused your error and fixed it. Our uploaded vision model was finetuned by us, and different from the google's version. So to get the results from the paper, you should use ours.
Thank you, it's working properly now.
compile
Thanks for your timely reply. Another new error:
- This IS expected if you are initializing TinyLlavaPhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing TinyLlavaPhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Traceback (most recent call last): File "/media/dell/data1/TinyLLaVABench/predict_RS_img_background_generation_tiny_llava.py", line 133, in <module> tokenizer, model, image_processor, context_len = load_pretrained_model( File "/media/dell/data1/TinyLLaVABench/tinyllava/model/builder.py", line 145, in load_pretrained_model vision_tower.load_model() File "/media/dell/data1/TinyLLaVABench/tinyllava/model/multimodal_encoder/clip_encoder.py", line 25, in load_model self.image_processor = CLIPImageProcessor.from_pretrained(self.vision_tower_name) File "/media/dell/data1/miniconda3/envs/glamm/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 206, in from_pretrained image_processor_dict, kwargs = cls.get_image_processor_dict(pretrained_model_name_or_path, **kwargs) File "/media/dell/data1/miniconda3/envs/glamm/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 335, in get_image_processor_dict resolved_image_processor_file = cached_file( File "/media/dell/data1/miniconda3/envs/glamm/lib/python3.10/site-packages/transformers/utils/hub.py", line 356, in cached_file raise EnvironmentError( OSError: /media/dell/data1/pretrain_weights/SigLIP does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co//media/dell/data1/pretrain_weights/SigLIP/main' for available files.
It seems the preprocessor_config.json is lacking in https://huggingface.co/bczhou/TinyLLaVA-3.1B-SigLIP/tree/main.
same problem. How to solve it?
Using preprocessor_config.json from https://huggingface.co/bczhou/tiny-llava-v1-hf/blob/main/preprocessor_config.json, but the size not match. @baichuanzhou could you help to figure it out?
Using preprocessor_config.json from https://huggingface.co/bczhou/tiny-llava-v1-hf/blob/main/preprocessor_config.json, but the size not match. @baichuanzhou could you help to figure it out?
Which model type are you using? tiny-llava-v1-hf
is our legacy model and cannot be loaded with load_pretrained_model
function. See its model card for how to run inference with it.
Inference as Run Inference
example, and use files from https://huggingface.co/bczhou/TinyLLaVA-3.1B/tree/main, it reports
.
Inference as
Run Inference
example, and use files from https://huggingface.co/bczhou/TinyLLaVA-3.1B/tree/main, it reports .
Did you download the vision encoder? Please tell me how the weights are stored in your file system(e.g. their respective path names). Thanks.
Hello, I would like to know how to perform inference with TinyLLaVA-3.1B? Simply replacing the
model_id
in the script of tiny-llava-v1-hf with TinyLLaVA-3.1B results in an error: 'You are using a model of type tiny_llava_phi to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.