VamosC / CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
Apache License 2.0
117 stars 14 forks source link

Training Error #27

Open philp123 opened 2 days ago

philp123 commented 2 days ago

(clip4str) root@Lab-PC:/workspace/Project/OCR/CLIP4STR# bash scripts/vl4str_base.sh abs_root: /home/shuai model: convert: all img_size:

loading checkpoint from /workspace/Project/OCR/CLIP4STR/pretrained/models--laion--CLIP-ViT-B-16-DataComp.XL-s13B-b90K/open_clip_pytorch_model.bin /workspace/Project/OCR/CLIP4STR/strhub/clip/clip.py:139: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. state_dict = torch.load(model_path, map_location="cpu") The dimension of the visual decoder is 512. | Name | Type | Params

0 | clip_model | CLIP | 149 M 1 | clip_model.visual | VisionTransformer | 86.2 M 2 | clip_model.transformer | Transformer | 37.8 M 3 | clip_model.token_embedding | Embedding | 25.3 M 4 | clip_model.ln_final | LayerNorm | 1.0 K 5 | visual_decoder | Decoder | 4.3 M 6 | visual_decoder.layers | ModuleList | 4.2 M 7 | visual_decoder.text_embed | TokenEmbedding | 49.7 K 8 | visual_decoder.norm | LayerNorm | 1.0 K 9 | visual_decoder.dropout | Dropout | 0 10 | visual_decoder.head | Linear | 48.7 K 11 | cross_decoder | Decoder | 4.3 M 12 | cross_decoder.layers | ModuleList | 4.2 M 13 | cross_decoder.text_embed | TokenEmbedding | 49.7 K 14 | cross_decoder.norm | LayerNorm | 1.0 K 15 | cross_decoder.dropout | Dropout | 0 16 | cross_decoder.head | Linear | 48.7 K

114 M Trainable params 44.3 M Non-trainable params 158 M Total params 633.025 Total estimated model params size (MB) [dataset] mean (0.48145466, 0.4578275, 0.40821073), std (0.26862954, 0.26130258, 0.27577711) /root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:478: LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead. rank_zero_deprecation( Using 16bit None Automatic Mixed Precision (AMP) /root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/native_amp.py:47: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead. scaler = torch.cuda.amp.GradScaler() GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/configuration_validator.py:92: UserWarning: When using Trainer(accumulate_grad_batches != 1) and overriding LightningModule.optimizer_{step,zero_grad}, the hooks will not be called on every batch (rather, they are called on every optimization step). rank_zero_warn( You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set torch.set_float32_matmul_precision('medium' | 'high') which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [VL4STR] The length of encoder params with and without weight decay is 76 and 151, respectively. [VL4STR] The length of decoder params with and without weight decay is 14 and 38, respectively. Loading train_dataloader to estimate number of stepping batches. dataset root: /workspace/Database/OCR/CLIP4STR/str_dataset_ub/train/real lmdb: ArT/train num samples: 28828 lmdb: ArT/val num samples: 3200 lmdb: LSVT/test num samples: 4093 lmdb: LSVT/train num samples: 33199 lmdb: LSVT/val num samples: 4147 lmdb: benchmark/IIIT5k num samples: 2000 lmdb: benchmark/IC15 num samples: 4468 lmdb: benchmark/IC13 num samples: 848 lmdb: benchmark/SVT num samples: 257 lmdb: ReCTS/test num samples: 2467 lmdb: ReCTS/train num samples: 21589 lmdb: ReCTS/val num samples: 2376 lmdb: TextOCR/train num samples: 710994 lmdb: TextOCR/val num samples: 107093 lmdb: OpenVINO/train_5 num samples: 495833 lmdb: OpenVINO/train_2 num samples: 502769 lmdb: OpenVINO/train_f num samples: 470562 lmdb: OpenVINO/train_1 num samples: 443620 lmdb: OpenVINO/validation num samples: 158757 lmdb: RCTW17/test num samples: 1030 lmdb: RCTW17/train num samples: 8225 lmdb: RCTW17/val num samples: 1029 lmdb: MLT19/test num samples: 5669 lmdb: MLT19/train num samples: 45384 lmdb: MLT19/val num samples: 5674 lmdb: COCOv2.0/train num samples: 59733 lmdb: COCOv2.0/val num samples: 13394 lmdb: Union14M-L-LMDB/medium num samples: 218154 lmdb: Union14M-L-LMDB/hard num samples: 145523 lmdb: Union14M-L-LMDB/hell num samples: 479156 lmdb: Union14M-L-LMDB/difficult num samples: 297164 lmdb: Union14M-L-LMDB/simple num samples: 2076687 lmdb: Uber/train num samples: 91732 lmdb: Uber/val num samples: 36188 lmdb: The number of training samples is 6481842 Sanity Checking: 0it [00:00, ?it/s]dataset root: /workspace/Database/OCR/CLIP4STR/str_dataset_ub/val lmdb: IIIT5k num samples: 2000 lmdb: IC15 num samples: 4467 lmdb: IC13 num samples: 843 lmdb: SVT num samples: 257 lmdb: The number of validation samples is 7567 Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/nn/functional.py:5193: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead. warnings.warn( Epoch 0: 0%| | 0/25680 [00:00<?, ?it/s]Error executing job with overrides: ['+experiment=vl4str', 'model=vl4str', 'dataset=real', 'data.root_dir=/workspace/Database/OCR/CLIP4STR/str_dataset_ub', 'trainer.max_epochs=11', 'trainer.gpus=1', 'model.lr=8.4e-5', 'model.batch_size=256', 'model.clip_pretrained=/workspace/Project/OCR/CLIP4STR/pretrained/models--laion--CLIP-ViT-B-16-DataComp.XL-s13B-b90K/open_clip_pytorch_model.bin', 'trainer.accumulate_grad_batches=4'] Traceback (most recent call last): File "/workspace/Project/OCR/CLIP4STR/train.py", line 141, in main() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main _run_hydra( File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra _run_app( File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app run_and_report( File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report raise ex File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report return func() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in lambda: hydra.run( File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/internal/hydra.py", line 132, in run = ret.return_value File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/workspace/Project/OCR/CLIP4STR/train.py", line 100, in main trainer.fit(model, datamodule=datamodule, ckpt_path=config.ckpt_path) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit call._call_and_handle_interrupt( File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run results = self._run_stage() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage self._run_train() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train self.fit_loop.run() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, *kwargs) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(args, kwargs) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 187, in advance batch = next(data_fetcher) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in next return self.fetching_function() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 265, in fetching_function self._fetch_next_batch(self.dataloader_iter) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/utilities/fetching.py", line 280, in _fetch_next_batch batch = next(iterator) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 571, in next return self.request_next_batch(self.loader_iters) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/pytorch_lightning/trainer/supporters.py", line 583, in request_next_batch return apply_to_collection(loader_iters, Iterator, next) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/lightning_utilities/core/apply_func.py", line 64, in apply_to_collection return function(data, *args, **kwargs) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data return self._process_data(data) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data data.reraise() File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/_utils.py", line 706, in reraise raise exception TypeError: Caught TypeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torch/utils/data/dataset.py", line 350, in getitem return self.datasets[dataset_idx][sample_idx] File "/workspace/Project/OCR/CLIP4STR/strhub/data/dataset.py", line 134, in getitem img = self.transform(img) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 95, in call img = t(img) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/timm/data/auto_augment.py", line 751, in call img = op(img) File "/root/anaconda3/envs/clip4str/lib/python3.10/site-packages/timm/data/auto_augment.py", line 381, in call if self.prob < 1.0 and random.random() > self.prob: TypeError: '<' not supported between instances of 'dict' and 'float'

Error seems to be related to dataset, when doing data augmentation.

philp123 commented 2 days ago

When I set augment: false(in configs/main.yaml), the code could run with "bash scripts/vl4str_base.sh", it seems that sth incorrect with data augmentation.

mzhaoshuai commented 2 days ago

That's weird. I never met the problem.

Will the Python version matter like that in https://github.com/VamosC/CLIP4STR/issues/15?

What is the timm version?