Open hshc123 opened 4 days ago
Hi,
Do you set
You got an AssertionError.
Hi,
Do you set
You got an AssertionError.
I downloaded a total of three models, namely: ViT-L-14.pt, CLIP-ViT-L-14-DataComp.XL-s13B-b90K, and clip4str_large_3c9d881b88.pt. Then I put them all under the path configured with # CLIP_PATH = '/PUT/YOUR/PATH/HERE/pretrained/clip'. After that, the above error occurred.
Hi, @hshc123
If you use OpenCLIP models, please rename CLIP-ViT-L-14-DataComp.XL-s13B-b90K
as OpenCLIP-ViT-L-14-DataComp-XL-s13B-b90K.bin.
Please try to check your first log
Try to load CLIP model from /workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/OpenCLIP-ViT-L-14-DataComp-XL-s13B-b90K.bin
when i run python read.py clip4str_large_3c9d881b88.pt --images_path misc/test_image/
The following error occurred:
root@e33ba27efab3:/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main# python read.py clip4str_large_3c9d881b88.pt --images_path misc/test_image/ [2024-07-03 13:45:24,525] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) Additional keyword arguments: {}
config of VL4STR: image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True, clip_cls_eot_feature: False cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False
config of VL4STR: image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True, clip_cls_eot_feature: False cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/read.py", line 54, in
main()
File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/read.py", line 37, in main
model = load_from_checkpoint(args.checkpoint, kwargs).eval().to(args.device)
File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/strhub/models/utils.py", line 117, in load_from_checkpoint
model.load_state_dict(checkpoint)
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VL4STR:
Missing key(s) in state_dict: "clip_model.positional_embedding", "clip_model.text_projection", "clip_model.logit_scale", "clip_model.visual.class_embedding", "clip_model.visual.positional_embedding", "clip_model.visual.proj", "clip_model.visual.conv1.weight", "clip_model.visual.ln_pre.weight", "clip_model.visual.ln_pre.bias", "clip_model.visual.transformer.resblocks.0.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.0.attn.in_proj_bias", "clip_mode...........