VamosC / CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
Apache License 2.0
90 stars 12 forks source link

Issue with inference #10

Closed Szransh closed 3 months ago

Szransh commented 3 months ago

Hi, I am trying to perform inference using the following script: bash code/clip4str/scripts/read.sh 7 clip4str_b_plus.ckpt /home/shreyans/scratch/tata1mg/clip4str_og/code/clip4str/misc/test_image

The error i get is:

Additional keyword arguments: {} args.checkpoint /home/shreyans/scratch/tata1mg/clip4str_og/output/clip4str_base16x16_d70bde1f2d.ckpt

config of VL4STR: image_freeze_nlayer: 0, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False

config of VL4STR: image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False

loading checkpoint from /home/shreyans/scratch/tata1mg/clip4str_og/pretrained/clip/ViT-B-16.pt The dimension of the visual decoder is 512. Traceback (most recent call last): File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/utils.py", line 104, in load_from_checkpoint model = ModelClass.load_from_checkpoint(checkpoint_path, kwargs) File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 161, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, kwargs) File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 203, in _load_model_state model = cls(**_cls_kwargs) File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/vl_str/system.py", line 70, in init assert os.path.exists(kwargs["clip_pretrained"]) AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/shreyans/scratch/tata1mg/clip4str_og/code/clip4str/read.py", line 54, in main() File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/shreyans/scratch/tata1mg/clip4str_og/code/clip4str/read.py", line 37, in main model = load_from_checkpoint(args.checkpoint, kwargs).eval().to(args.device) File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/utils.py", line 113, in load_from_checkpoint model.load_state_dict(checkpoint) File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for VL4STR: Missing key(s) in state_dict: "clip_model.positional_embedding", "clip_model.text_projection", "clip_model.logit_scale", "clip_model.visual.class_embedding", "clip_model.visual.positional_embedding", "clip_model.visual.proj", "clip_model.visual.conv1.weight", "clip_model.visual.ln_pre.weight", "clip_model.visual.ln_pre.bias", "clip_model.visual.transformer.resblocks.0.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.0.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.0.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.0.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.0.ln_1.weight", "clip_model.visual.transformer.resblocks.0.ln_1.bias", "clip_model.visual.transformer.resblocks.0.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.0.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.0.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.0.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.0.ln_2.weight", "clip_model.visual.transformer.resblocks.0.ln_2.bias", "clip_model.visual.transformer.resblocks.1.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.1.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.1.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.1.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.1.ln_1.weight", "clip_model.visual.transformer.resblocks.1.ln_1.bias", "clip_model.visual.transformer.resblocks.1.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.1.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.1.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.1.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.1.ln_2.weight", "clip_model.visual.transformer.resblocks.1.ln_2.bias", "clip_model.visual.transformer.resblocks.2.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.2.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.2.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.2.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.2.ln_1.weight", "clip_model.visual.transformer.resblocks.2.ln_1.bias", "clip_model.visual.transformer.resblocks.2.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.2.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.2.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.2.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.2.ln_2.weight", "clip_model.visual.transformer.resblocks.2.ln_2.bias", "clip_model.visual.transformer.resblocks.3.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.3.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.3.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.3.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.3.ln_1.weight", "clip_model.visual.transformer.resblocks.3.ln_1.bias", "clip_model.visual.transformer.resblocks.3.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.3.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.3.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.3.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.3.ln_2.weight", "clip_model.visual.transformer.resblocks.3.ln_2.bias", "clip_model.visual.transformer.resblocks.4.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.4.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.4.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.4.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.4.ln_1.weight", "clip_model.visual.transformer.resblocks.4.ln_1.bias", "clip_model.visual.transformer.resblocks.4.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.4.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.4.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.4.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.4.ln_2.weight", "clip_model.visual.transformer.resblocks.4.ln_2.bias", "clip_model.visual.transformer.resblocks.5.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.5.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.5.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.5.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.5.ln_1.weight", "clip_model.visual.transformer.resblocks.5.ln_1.bias", "clip_model.visual.transformer.resblocks.5.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.5.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.5.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.5.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.5.ln_2.weight", "clip_model.visual.transformer.resblocks.5.ln_2.bias", "clip_model.visual.transformer.resblocks.6.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.6.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.6.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.6.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.6.ln_1.weight", "clip_model.visual.transformer.resblocks.6.ln_1.bias", "clip_model.visual.transformer.resblocks.6.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.6.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.6.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.6.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.6.ln_2.weight", "clip_model.visual.transformer.resblocks.6.ln_2.bias", "clip_model.visual.transformer.resblocks.7.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.7.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.7.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.7.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.7.ln_1.weight", "clip_model.visual.transformer.resblocks.7.ln_1.bias", "clip_model.visual.transformer.resblocks.7.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.7.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.7.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.7.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.7.ln_2.weight", "clip_model.visual.transformer.resblocks.7.ln_2.bias", "clip_model.visual.transformer.resblocks.8.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.8.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.8.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.8.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.8.ln_1.weight", "clip_model.visual.transformer.resblocks.8.ln_1.bias", "clip_model.visual.transformer.resblocks.8.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.8.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.8.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.8.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.8.ln_2.weight", "clip_model.visual.transformer.resblocks.8.ln_2.bias", "clip_model.visual.transformer.resblocks.9.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.9.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.9.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.9.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.9.ln_1.weight", "clip_model.visual.transformer.resblocks.9.ln_1.bias", "clip_model.visual.transformer.resblocks.9.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.9.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.9.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.9.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.9.ln_2.weight", "clip_model.visual.transformer.resblocks.9.ln_2.bias", "clip_model.visual.transformer.resblocks.10.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.10.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.10.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.10.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.10.ln_1.weight", "clip_model.visual.transformer.resblocks.10.ln_1.bias", "clip_model.visual.transformer.resblocks.10.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.10.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.10.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.10.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.10.ln_2.weight", "clip_model.visual.transformer.resblocks.10.ln_2.bias", "clip_model.visual.transformer.resblocks.11.attn.in_proj_weight", "clip_model.visual.transformer.resblocks.11.attn.in_proj_bias", "clip_model.visual.transformer.resblocks.11.attn.out_proj.weight", "clip_model.visual.transformer.resblocks.11.attn.out_proj.bias", "clip_model.visual.transformer.resblocks.11.ln_1.weight", "clip_model.visual.transformer.resblocks.11.ln_1.bias", "clip_model.visual.transformer.resblocks.11.mlp.c_fc.weight", "clip_model.visual.transformer.resblocks.11.mlp.c_fc.bias", "clip_model.visual.transformer.resblocks.11.mlp.c_proj.weight", "clip_model.visual.transformer.resblocks.11.mlp.c_proj.bias", "clip_model.visual.transformer.resblocks.11.ln_2.weight", "clip_model.visual.transformer.resblocks.11.ln_2.bias", "clip_model.visual.ln_post.weight", "clip_model.visual.ln_post.bias", "clip_model.transformer.resblocks.0.attn.in_proj_weight", "clip_model.transformer.resblocks.0.attn.in_proj_bias", "clip_model.transformer.resblocks.0.attn.out_proj.weight", "clip_model.transformer.resblocks.0.attn.out_proj.bias", "clip_model.transformer.resblocks.0.ln_1.weight", "clip_model.transformer.resblocks.0.ln_1.bias", "clip_model.transformer.resblocks.0.mlp.c_fc.weight", "clip_model.transformer.resblocks.0.mlp.c_fc.bias", "clip_model.transformer.resblocks.0.mlp.c_proj.weight", "clip_model.transformer.resblocks.0.mlp.c_proj.bias", "clip_model.transformer.resblocks.0.ln_2.weight", "clip_model.transformer.resblocks.0.ln_2.bias", "clip_model.transformer.resblocks.1.attn.in_proj_weight", "clip_model.transformer.resblocks.1.attn.in_proj_bias", "clip_model.transformer.resblocks.1.attn.out_proj.weight", "clip_model.transformer.resblocks.1.attn.out_proj.bias", "clip_model.transformer.resblocks.1.ln_1.weight", "clip_model.transformer.resblocks.1.ln_1.bias", "clip_model.transformer.resblocks.1.mlp.c_fc.weight", "clip_model.transformer.resblocks.1.mlp.c_fc.bias", "clip_model.transformer.resblocks.1.mlp.c_proj.weight", "clip_model.transformer.resblocks.1.mlp.c_proj.bias", "clip_model.transformer.resblocks.1.ln_2.weight", "clip_model.transformer.resblocks.1.ln_2.bias", "clip_model.transformer.resblocks.2.attn.in_proj_weight", "clip_model.transformer.resblocks.2.attn.in_proj_bias", "clip_model.transformer.resblocks.2.attn.out_proj.weight", "clip_model.transformer.resblocks.2.attn.out_proj.bias", "clip_model.transformer.resblocks.2.ln_1.weight", "clip_model.transformer.resblocks.2.ln_1.bias", "clip_model.transformer.resblocks.2.mlp.c_fc.weight", "clip_model.transformer.resblocks.2.mlp.c_fc.bias", "clip_model.transformer.resblocks.2.mlp.c_proj.weight", "clip_model.transformer.resblocks.2.mlp.c_proj.bias", "clip_model.transformer.resblocks.2.ln_2.weight", "clip_model.transformer.resblocks.2.ln_2.bias", "clip_model.transformer.resblocks.3.attn.in_proj_weight", "clip_model.transformer.resblocks.3.attn.in_proj_bias", "clip_model.transformer.resblocks.3.attn.out_proj.weight", "clip_model.transformer.resblocks.3.attn.out_proj.bias", "clip_model.transformer.resblocks.3.ln_1.weight", "clip_model.transformer.resblocks.3.ln_1.bias", "clip_model.transformer.resblocks.3.mlp.c_fc.weight", "clip_model.transformer.resblocks.3.mlp.c_fc.bias", "clip_model.transformer.resblocks.3.mlp.c_proj.weight", "clip_model.transformer.resblocks.3.mlp.c_proj.bias", "clip_model.transformer.resblocks.3.ln_2.weight", "clip_model.transformer.resblocks.3.ln_2.bias", "clip_model.transformer.resblocks.4.attn.in_proj_weight", "clip_model.transformer.resblocks.4.attn.in_proj_bias", "clip_model.transformer.resblocks.4.attn.out_proj.weight", "clip_model.transformer.resblocks.4.attn.out_proj.bias", "clip_model.transformer.resblocks.4.ln_1.weight", "clip_model.transformer.resblocks.4.ln_1.bias", "clip_model.transformer.resblocks.4.mlp.c_fc.weight", "clip_model.transformer.resblocks.4.mlp.c_fc.bias", "clip_model.transformer.resblocks.4.mlp.c_proj.weight", "clip_model.transformer.resblocks.4.mlp.c_proj.bias", "clip_model.transformer.resblocks.4.ln_2.weight", "clip_model.transformer.resblocks.4.ln_2.bias", "clip_model.transformer.resblocks.5.attn.in_proj_weight", "clip_model.transformer.resblocks.5.attn.in_proj_bias", "clip_model.transformer.resblocks.5.attn.out_proj.weight", "clip_model.transformer.resblocks.5.attn.out_proj.bias", "clip_model.transformer.resblocks.5.ln_1.weight", "clip_model.transformer.resblocks.5.ln_1.bias", "clip_model.transformer.resblocks.5.mlp.c_fc.weight", "clip_model.transformer.resblocks.5.mlp.c_fc.bias", "clip_model.transformer.resblocks.5.mlp.c_proj.weight", "clip_model.transformer.resblocks.5.mlp.c_proj.bias", "clip_model.transformer.resblocks.5.ln_2.weight", "clip_model.transformer.resblocks.5.ln_2.bias", "clip_model.transformer.resblocks.6.attn.in_proj_weight", "clip_model.transformer.resblocks.6.attn.in_proj_bias", "clip_model.transformer.resblocks.6.attn.out_proj.weight", "clip_model.transformer.resblocks.6.attn.out_proj.bias", "clip_model.transformer.resblocks.6.ln_1.weight", "clip_model.transformer.resblocks.6.ln_1.bias", "clip_model.transformer.resblocks.6.mlp.c_fc.weight", "clip_model.transformer.resblocks.6.mlp.c_fc.bias", "clip_model.transformer.resblocks.6.mlp.c_proj.weight", "clip_model.transformer.resblocks.6.mlp.c_proj.bias", "clip_model.transformer.resblocks.6.ln_2.weight", "clip_model.transformer.resblocks.6.ln_2.bias", "clip_model.transformer.resblocks.7.attn.in_proj_weight", "clip_model.transformer.resblocks.7.attn.in_proj_bias", "clip_model.transformer.resblocks.7.attn.out_proj.weight", "clip_model.transformer.resblocks.7.attn.out_proj.bias", "clip_model.transformer.resblocks.7.ln_1.weight", "clip_model.transformer.resblocks.7.ln_1.bias", "clip_model.transformer.resblocks.7.mlp.c_fc.weight", "clip_model.transformer.resblocks.7.mlp.c_fc.bias", "clip_model.transformer.resblocks.7.mlp.c_proj.weight", "clip_model.transformer.resblocks.7.mlp.c_proj.bias", "clip_model.transformer.resblocks.7.ln_2.weight", "clip_model.transformer.resblocks.7.ln_2.bias", "clip_model.transformer.resblocks.8.attn.in_proj_weight", "clip_model.transformer.resblocks.8.attn.in_proj_bias", "clip_model.transformer.resblocks.8.attn.out_proj.weight", "clip_model.transformer.resblocks.8.attn.out_proj.bias", "clip_model.transformer.resblocks.8.ln_1.weight", "clip_model.transformer.resblocks.8.ln_1.bias", "clip_model.transformer.resblocks.8.mlp.c_fc.weight", "clip_model.transformer.resblocks.8.mlp.c_fc.bias", "clip_model.transformer.resblocks.8.mlp.c_proj.weight", "clip_model.transformer.resblocks.8.mlp.c_proj.bias", "clip_model.transformer.resblocks.8.ln_2.weight", "clip_model.transformer.resblocks.8.ln_2.bias", "clip_model.transformer.resblocks.9.attn.in_proj_weight", "clip_model.transformer.resblocks.9.attn.in_proj_bias", "clip_model.transformer.resblocks.9.attn.out_proj.weight", "clip_model.transformer.resblocks.9.attn.out_proj.bias", "clip_model.transformer.resblocks.9.ln_1.weight", "clip_model.transformer.resblocks.9.ln_1.bias", "clip_model.transformer.resblocks.9.mlp.c_fc.weight", "clip_model.transformer.resblocks.9.mlp.c_fc.bias", "clip_model.transformer.resblocks.9.mlp.c_proj.weight", "clip_model.transformer.resblocks.9.mlp.c_proj.bias", "clip_model.transformer.resblocks.9.ln_2.weight", "clip_model.transformer.resblocks.9.ln_2.bias", "clip_model.transformer.resblocks.10.attn.in_proj_weight", "clip_model.transformer.resblocks.10.attn.in_proj_bias", "clip_model.transformer.resblocks.10.attn.out_proj.weight", "clip_model.transformer.resblocks.10.attn.out_proj.bias", "clip_model.transformer.resblocks.10.ln_1.weight", "clip_model.transformer.resblocks.10.ln_1.bias", "clip_model.transformer.resblocks.10.mlp.c_fc.weight", "clip_model.transformer.resblocks.10.mlp.c_fc.bias", "clip_model.transformer.resblocks.10.mlp.c_proj.weight", "clip_model.transformer.resblocks.10.mlp.c_proj.bias", "clip_model.transformer.resblocks.10.ln_2.weight", "clip_model.transformer.resblocks.10.ln_2.bias", "clip_model.transformer.resblocks.11.attn.in_proj_weight", "clip_model.transformer.resblocks.11.attn.in_proj_bias", "clip_model.transformer.resblocks.11.attn.out_proj.weight", "clip_model.transformer.resblocks.11.attn.out_proj.bias", "clip_model.transformer.resblocks.11.ln_1.weight", "clip_model.transformer.resblocks.11.ln_1.bias", "clip_model.transformer.resblocks.11.mlp.c_fc.weight", "clip_model.transformer.resblocks.11.mlp.c_fc.bias", "clip_model.transformer.resblocks.11.mlp.c_proj.weight", "clip_model.transformer.resblocks.11.mlp.c_proj.bias", "clip_model.transformer.resblocks.11.ln_2.weight", "clip_model.transformer.resblocks.11.ln_2.bias", "clip_model.token_embedding.weight", "clip_model.ln_final.weight", "clip_model.ln_final.bias", "visual_decoder.pos_queries", "visual_decoder.layers.0.self_attn.in_proj_weight", "visual_decoder.layers.0.self_attn.in_proj_bias", "visual_decoder.layers.0.self_attn.out_proj.weight", "visual_decoder.layers.0.self_attn.out_proj.bias", "visual_decoder.layers.0.cross_attn.in_proj_weight", "visual_decoder.layers.0.cross_attn.in_proj_bias", "visual_decoder.layers.0.cross_attn.out_proj.weight", "visual_decoder.layers.0.cross_attn.out_proj.bias", "visual_decoder.layers.0.linear1.weight", "visual_decoder.layers.0.linear1.bias", "visual_decoder.layers.0.linear2.weight", "visual_decoder.layers.0.linear2.bias", "visual_decoder.layers.0.norm1.weight", "visual_decoder.layers.0.norm1.bias", "visual_decoder.layers.0.norm2.weight", "visual_decoder.layers.0.norm2.bias", "visual_decoder.layers.0.norm_q.weight", "visual_decoder.layers.0.norm_q.bias", "visual_decoder.layers.0.norm_c.weight", "visual_decoder.layers.0.norm_c.bias", "visual_decoder.text_embed.embedding.weight", "visual_decoder.norm.weight", "visual_decoder.norm.bias", "visual_decoder.head.weight", "visual_decoder.head.bias", "cross_decoder.pos_queries", "cross_decoder.layers.0.self_attn.in_proj_weight", "cross_decoder.layers.0.self_attn.in_proj_bias", "cross_decoder.layers.0.self_attn.out_proj.weight", "cross_decoder.layers.0.self_attn.out_proj.bias", "cross_decoder.layers.0.cross_attn.in_proj_weight", "cross_decoder.layers.0.cross_attn.in_proj_bias", "cross_decoder.layers.0.cross_attn.out_proj.weight", "cross_decoder.layers.0.cross_attn.out_proj.bias", "cross_decoder.layers.0.linear1.weight", "cross_decoder.layers.0.linear1.bias", "cross_decoder.layers.0.linear2.weight", "cross_decoder.layers.0.linear2.bias", "cross_decoder.layers.0.norm1.weight", "cross_decoder.layers.0.norm1.bias", "cross_decoder.layers.0.norm2.weight", "cross_decoder.layers.0.norm2.bias", "cross_decoder.layers.0.norm_q.weight", "cross_decoder.layers.0.norm_q.bias", "cross_decoder.layers.0.norm_c.weight", "cross_decoder.layers.0.norm_c.bias", "cross_decoder.text_embed.embedding.weight", "cross_decoder.norm.weight", "cross_decoder.norm.bias", "cross_decoder.head.weight", "cross_decoder.head.bias". Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "loops", "callbacks", "optimizer_states", "lr_schedulers", "NativeMixedPrecisionPlugin", "hparams_name", "hyper_parameters"

Ihave done everything as mentioned, what is the reason?

VamosC commented 3 months ago

Hi, @Szransh Do you set https://github.com/VamosC/CLIP4STR/blob/a9f209a3c019d740f55bcb394a97f1ee720dae15/strhub/models/vl_str/system.py#L22 properly?

Szransh commented 3 months ago

yes, i have set the paths correctly as: CLIP_PATH = '/home/shreyans/scratch/clip4str_og/pretrained/clip/ViT-B-16.pt'

VamosC commented 3 months ago

@Szransh

https://github.com/VamosC/CLIP4STR/blob/a9f209a3c019d740f55bcb394a97f1ee720dae15/strhub/models/vl_str/system.py#L68-L70

loading checkpoint from /home/shreyans/scratch/tata1mg/clip4str_og/pretrained/clip/ViT-B-16.pt
The dimension of the visual decoder is 512.
Traceback (most recent call last):
File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/utils.py", line 104, in load_from_checkpoint
model = ModelClass.load_from_checkpoint(checkpoint_path, **kwargs)
File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 161, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, kwargs)
File "/home/shreyans/scratch/miniconda3/envs/clip4str/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 203, in _load_model_state
model = cls(_cls_kwargs)
File "/DATA/scratch/shreyans/tata1mg/clip4str_og/code/clip4str/strhub/models/vl_str/system.py", line 70, in init

assert os.path.exists(kwargs["clip_pretrained"])
AssertionError

It does not pass the assertion. It seems /home/shreyans/scratch/tata1mg/clip4str_og/pretrained/clip/ViT-B-16.pt or /home/shreyans/scratch/clip4str_og/pretrained/clip/ViT-B-16.pt does not exist.

Or you can simply set CLIP_PATH = '/home/shreyans/scratch/clip4str_og/pretrained/clip.

Szransh commented 3 months ago
Screenshot 2024-03-20 at 4 49 37 PM

CLIP_PATH = '/home/shreyans/scratch/clip4str_og/pretrained/clip solves the issue, thanks.

Szransh commented 3 months ago

can you tell me where to find the checkpoints, i am training the model but unable to see the trained checkpoint. All i find is some temp files in output directory.

VamosC commented 3 months ago

https://github.com/VamosC/CLIP4STR/blob/a9f209a3c019d740f55bcb394a97f1ee720dae15/configs/main.yaml#L58

Checkpoints should be available at the above directory. How long have you been training the model? The program may save the checkpoints after a certain training steps.

Szransh commented 3 months ago

Hi, i have been training models for more than 36 hours, i can find some log files but not the checkpoint.

VamosC commented 3 months ago

@Szransh Hi, that is weird. Can you find the folder named like vl4str_xxxx?

Szransh commented 3 months ago

Hi, I finally got a checkpoint in the output folder; it was weird that I didn't get the checkpoint earlier. Can you guide me how to export the result of inference with image name and label in a csv format, where do they actually get printed so I can save the information.

VamosC commented 3 months ago

sorry, this is beyond my scope. You can check some online tutorials.

Szransh commented 3 months ago

where do you print image and text in the inference ? can you share for vl4str model.

VamosC commented 3 months ago

Check these lines: https://github.com/VamosC/CLIP4STR/blob/a9f209a3c019d740f55bcb394a97f1ee720dae15/test.py#L177-L183

Szransh commented 3 months ago

thanks for all the replies, my issue is solved.