VamosC / CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
Apache License 2.0
86 stars 11 forks source link

inference error #19

Open hshc123 opened 4 days ago

hshc123 commented 4 days ago

when i run python read.py clip4str_large_3c9d881b88.pt --images_path misc/test_image/

The following error occurred:

root@e33ba27efab3:/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main# python read.py clip4str_large_3c9d881b88.pt --images_path misc/test_image/ [2024-07-03 13:45:24,525] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) Additional keyword arguments: {}

config of VL4STR: image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True, clip_cls_eot_feature: False cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False

Try to load CLIP model from /workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/OpenCLIP-ViT-L-14-DataComp-XL-s13B-b90K.bin

config of VL4STR: image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True, clip_cls_eot_feature: False cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False

Try to load CLIP model from /workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/ViT-L-14.pt loading checkpoint from /workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/ViT-L-14.pt The dimension of the visual decoder is 768. Traceback (most recent call last): File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/strhub/models/utils.py", line 108, in load_from_checkpoint model = ModelClass.load_from_checkpoint(checkpoint_path, kwargs) File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 161, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, kwargs) File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 203, in _load_model_state model = cls(**_cls_kwargs) File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/strhub/models/vl_str/system.py", line 77, in init assert os.path.exists(kwargs["clip_pretrained"]) AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/read.py", line 54, in main() File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/read.py", line 37, in main model = load_from_checkpoint(args.checkpoint, kwargs).eval().to(args.device) File "/workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/strhub/models/utils.py", line 117, in load_from_checkpoint model.load_state_dict(checkpoint) File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for VL4STR: Missing key(s) in state_dict: "clip_model.positional_embedding", "clip_model.text_projection", "clip_model.logit_scale", "clip_model.visual.class_embedding", "clip_model.visual.positional_embedding", "clip_model.visual.proj", "clip_model.visual.conv1.weight", "clip_model.visual.ln_pre.weight", "clip_model.visual.ln_pre.bias", "clip_model.visual.transformer.resblocks.0.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.0.attn.in_proj_bias", "clip_mode...........

mzhaoshuai commented 4 days ago

Hi,

Do you set https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/models/vl_str/system.py#L22 properly?

https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/models/vl_str/system.py#L77

You got an AssertionError.

hshc123 commented 2 days ago

Hi,

Do you set

https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/models/vl_str/system.py#L22

properly? https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/models/vl_str/system.py#L77

You got an AssertionError.

Hi,

Do you set

https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/models/vl_str/system.py#L22

properly? https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/models/vl_str/system.py#L77

You got an AssertionError.

I downloaded a total of three models, namely: ViT-L-14.pt, CLIP-ViT-L-14-DataComp.XL-s13B-b90K, and clip4str_large_3c9d881b88.pt. Then I put them all under the path configured with # CLIP_PATH = '/PUT/YOUR/PATH/HERE/pretrained/clip'. After that, the above error occurred.

mzhaoshuai commented 2 days ago

Hi, @hshc123

https://github.com/VamosC/CLIP4STR/blob/d18f2f4b98b7e3dc1a59a845a6940997a4e9c09c/strhub/clip/clip.py#L137

If you use OpenCLIP models, please rename CLIP-ViT-L-14-DataComp.XL-s13B-b90K as OpenCLIP-ViT-L-14-DataComp-XL-s13B-b90K.bin.

Please try to check your first log

Try to load CLIP model from /workspace/data_dir/data_user/zyy/OCR/CLIP4STR-main/OpenCLIP-ViT-L-14-DataComp-XL-s13B-b90K.bin