Open philp123 opened 1 month ago
When I set augment: false(in configs/main.yaml), the code could run with "bash scripts/vl4str_base.sh", it seems that sth incorrect with data augmentation.
That's weird. I never met the problem.
Will the Python version matter like that in https://github.com/VamosC/CLIP4STR/issues/15?
What is the timm
version?
pip install timm==0.5.4
solved the above error for me.
(clip4str) root@Lab-PC:/workspace/Project/OCR/CLIP4STR# bash scripts/vl4str_base.sh abs_root: /home/shuai model: convert: all img_size:
16 embed_dim: 512 enc_num_heads: 12 enc_mlp_ratio: 4 enc_depth: 12 enc_width: 768 dec_num_heads: 8 dec_mlp_ratio: 4 dec_depth: 1 enc_del_cls: false dec_ndim_no_decay: true context_length: 16 use_language_model: true image_detach: true type_embedding: false cross_gt_context: true cross_cloze_mask: false cross_extra_attn: false cross_correct_once: false cross_loss_w: 1.0 itm_loss: false itm_loss_weight: 0.1 cross_token_embeding: false fusion_model: false image_freeze_nlayer: -1 text_freeze_nlayer: 6 image_freeze_layer_divisor: 0 image_only_fc: false use_share_dim: true clip_cls_eot_feature: false lr: 8.4e-05 coef_lr: 19.0 coef_wd: 1.0 perm_num: 6 perm_forward: true perm_mirrored: true dropout: 0.1 decode_ar: true refine_iters: 1 freeze_backbone: false freeze_language_backbone: false clip_pretrained: /workspace/Project/OCR/CLIP4STR/pretrained/models--laion--CLIP-ViT-B-16-DataComp.XL-s13B-b90K/open_clip_pytorch_model.bin find_unused_parameters: true data: target: strhub.data.module.SceneTextDataModule root_dir: /workspace/Database/OCR/CLIP4STR/str_dataset_ub output_url: null train_dir: real batch_size: ${model.batch_size} img_size: ${model.img_size} charset_train: ${model.charset_train} charset_test: ${model.charset_test} max_label_length: ${model.max_label_length} remove_whitespace: true normalize_unicode: true augment: true num_workers: 8 openai_meanstd: true trainer: target: pytorch_lightning.Trainer convert: all val_check_interval: 2000 max_epochs: 11 gradient_clip_val: 20 gpus: 1 accumulate_grad_batches: 4 precision: 16 ckpt_path: null pretrained: null swa: false
config of VL4STR: image_freeze_nlayer: -1, text_freeze_nlayer: 6, freeze_language_backbone: False, freeze_image_backbone: False use_language_model: True, context_length: 16, cross_token_embeding: False, cross_loss_weight: 1.0 use_share_dim: True, image_detach: True, clip_cls_eot_feature: False cross_gt_context: True, cross_cloze_mask: False, cross_fast_decode: False
loading checkpoint from /workspace/Project/OCR/CLIP4STR/pretrained/models--laion--CLIP-ViT-B-16-DataComp.XL-s13B-b90K/open_clip_pytorch_model.bin /workspace/Project/OCR/CLIP4STR/strhub/clip/clip.py:139: FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(model_path, map_location="cpu") The dimension of the visual decoder is 512. | Name | Type | Params0 | clip_model | CLIP | 149 M 1 | clip_model.visual | VisionTransformer | 86.2 M 2 | clip_model.transformer | Transformer | 37.8 M 3 | clip_model.token_embedding | Embedding | 25.3 M 4 | clip_model.ln_final | LayerNorm | 1.0 K 5 | visual_decoder | Decoder | 4.3 M 6 | visual_decoder.layers | ModuleList | 4.2 M 7 | visual_decoder.text_embed | TokenEmbedding | 49.7 K 8 | visual_decoder.norm | LayerNorm | 1.0 K 9 | visual_decoder.dropout | Dropout | 0 10 | visual_decoder.head | Linear | 48.7 K 11 | cross_decoder | Decoder | 4.3 M 12 | cross_decoder.layers | ModuleList | 4.2 M 13 | cross_decoder.text_embed | TokenEmbedding | 49.7 K 14 | cross_decoder.norm | LayerNorm | 1.0 K 15 | cross_decoder.dropout | Dropout | 0 16 | cross_decoder.head | Linear | 48.7 K
114 M Trainable params 44.3 M Non-trainable params 158 M Total params 633.025 Total estimated model params size (MB) [dataset] mean (0.48145466, 0.4578275, 0.40821073), std (0.26862954, 0.26130258, 0.27577711) /root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:478: LightningDeprecationWarning: Setting
main()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/internal/hydra.py", line 132, in run
= ret.return_value
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/workspace/Project/OCR/CLIP4STR/train.py", line 100, in main
trainer.fit(model, datamodule=datamodule, ckpt_path=config.ckpt_path)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, kwargs)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
results = self._run_stage()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
self._run_train()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
self.fit_loop.run()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, *kwargs)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(args, kwargs)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 187, in advance
batch = next(data_fetcher)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in next
return self.fetching_function()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 265, in fetching_function
self._fetch_next_batch(self.dataloader_iter)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 280, in _fetch_next_batch
batch = next(iterator)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 571, in next
return self.request_next_batch(self.loader_iters)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 583, in request_next_batch
return apply_to_collection(loader_iters, Iterator, next)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 64, in apply_to_collection
return function(data, *args, **kwargs)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next
data = self._next_data()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
return self._process_data(data)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
data.reraise()
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/_utils.py", line 706, in reraise
raise exception
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataset.py", line 350, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/workspace/Project/OCR/CLIP4STR/strhub/data/dataset.py", line 134, in getitem
img = self.transform(img)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 95, in call
img = t(img)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/timm/data/auto_augment.py", line 751, in call
img = op(img)
File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/timm/data/auto_augment.py", line 381, in call
if self.prob < 1.0 and random.random() > self.prob:
TypeError: '<' not supported between instances of 'dict' and 'float'
Trainer(gpus=1)
is deprecated in v1.7 and will be removed in v2.0. Please useTrainer(accelerator='gpu', devices=1)
instead. rank_zero_deprecation( Using 16bit None Automatic Mixed Precision (AMP) /root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/native_amp.py:47: FutureWarning:torch.cuda.amp.GradScaler(args...)
is deprecated. Please usetorch.amp.GradScaler('cuda', args...)
instead. scaler = torch.cuda.amp.GradScaler() GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/configuration_validator.py:92: UserWarning: When usingTrainer(accumulate_grad_batches != 1)
and overridingLightningModule.optimizer_{step,zero_grad}
, the hooks will not be called on every batch (rather, they are called on every optimization step). rank_zero_warn( You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should settorch.set_float32_matmul_precision('medium' | 'high')
which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [VL4STR] The length of encoder params with and without weight decay is 76 and 151, respectively. [VL4STR] The length of decoder params with and without weight decay is 14 and 38, respectively. Loadingtrain_dataloader
to estimate number of stepping batches. dataset root: /workspace/Database/OCR/CLIP4STR/str_dataset_ub/train/real lmdb: ArT/train num samples: 28828 lmdb: ArT/val num samples: 3200 lmdb: LSVT/test num samples: 4093 lmdb: LSVT/train num samples: 33199 lmdb: LSVT/val num samples: 4147 lmdb: benchmark/IIIT5k num samples: 2000 lmdb: benchmark/IC15 num samples: 4468 lmdb: benchmark/IC13 num samples: 848 lmdb: benchmark/SVT num samples: 257 lmdb: ReCTS/test num samples: 2467 lmdb: ReCTS/train num samples: 21589 lmdb: ReCTS/val num samples: 2376 lmdb: TextOCR/train num samples: 710994 lmdb: TextOCR/val num samples: 107093 lmdb: OpenVINO/train_5 num samples: 495833 lmdb: OpenVINO/train_2 num samples: 502769 lmdb: OpenVINO/train_f num samples: 470562 lmdb: OpenVINO/train_1 num samples: 443620 lmdb: OpenVINO/validation num samples: 158757 lmdb: RCTW17/test num samples: 1030 lmdb: RCTW17/train num samples: 8225 lmdb: RCTW17/val num samples: 1029 lmdb: MLT19/test num samples: 5669 lmdb: MLT19/train num samples: 45384 lmdb: MLT19/val num samples: 5674 lmdb: COCOv2.0/train num samples: 59733 lmdb: COCOv2.0/val num samples: 13394 lmdb: Union14M-L-LMDB/medium num samples: 218154 lmdb: Union14M-L-LMDB/hard num samples: 145523 lmdb: Union14M-L-LMDB/hell num samples: 479156 lmdb: Union14M-L-LMDB/difficult num samples: 297164 lmdb: Union14M-L-LMDB/simple num samples: 2076687 lmdb: Uber/train num samples: 91732 lmdb: Uber/val num samples: 36188 lmdb: The number of training samples is 6481842 Sanity Checking: 0it [00:00, ?it/s]dataset root: /workspace/Database/OCR/CLIP4STR/str_dataset_ub/val lmdb: IIIT5k num samples: 2000 lmdb: IC15 num samples: 4467 lmdb: IC13 num samples: 843 lmdb: SVT num samples: 257 lmdb: The number of validation samples is 7567 Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/nn/functional.py:5193: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead. warnings.warn( Epoch 0: 0%| | 0/25680 [00:00<?, ?it/s]Error executing job with overrides: ['+experiment=vl4str', 'model=vl4str', 'dataset=real', 'data.root_dir=/workspace/Database/OCR/CLIP4STR/str_dataset_ub', 'trainer.max_epochs=11', 'trainer.gpus=1', 'model.lr=8.4e-5', 'model.batch_size=256', 'model.clip_pretrained=/workspace/Project/OCR/CLIP4STR/pretrained/models--laion--CLIP-ViT-B-16-DataComp.XL-s13B-b90K/open_clip_pytorch_model.bin', 'trainer.accumulate_grad_batches=4'] Traceback (most recent call last): File "/workspace/Project/OCR/CLIP4STR/train.py", line 141, inError seems to be related to dataset, when doing data augmentation.