Open zeal-up opened 4 weeks ago
Can you check the discussion and update code to try again? It's weird you are loading with a CLIP model's architecture.
same issue here
Loading vision tower: /home/docker/.cache/huggingface/manually_hub/siglip-so400m-patch14-384
It seems that you're using your vision tower from a local path, which will then be loaded as a clip vision tower, not a siglip vision tower.
Exchanging the if
and elif
predicates may solve the problem. But note that this modification may have potential side-effect to other vision towers.
this one can work well. but I am unsure why returning with output_attentions will cause NAN and output ["!"]. there are not issue for 0.5B and 72B.
return_dict_in_generate = model.generate(
input_ids,
images=image_tensors,
attention_mask=attention_masks,
pad_token_id=tokenizer.pad_token_id,
use_cache=True,
# output_attentions=True, return_dict_in_generate=True,
**gen_kwargs
)
Hi~,I am recently trying to use the llava_onevision model, I try to follow the onevision tutorial, which seems pretty easy. I run the program exactly as the tutorial, the model is 0.5b_si. However, a ValueError raised when loading the checkpoint
It seems that no one report this issue.