The provided lr scheduler `OneCycleLR` doesn't follow PyTorch's LRScheduler API

gioivuathoi commented 9 months ago

Thank you for your great work! I am trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 229 tokens) (use Google Colab) I have changed charset, code in strhub/models/vl_str/systems.py and other files so that I can use Text_encoder from Multilingual_CLIP for Vietnamese Now I am getting an error for Learning rate scheduler as following:

The dimension of the visual decoder is 768. Len of Tokenizer 232 Done creating model! | Name | Type | Params

0 | clip_model | CLIP | 427 M 1 | clip_model.visual | VisionTransformer | 303 M 2 | clip_model.transformer | Transformer | 85.1 M 3 | clip_model.token_embedding | Embedding | 37.9 M 4 | clip_model.ln_final | LayerNorm | 1.5 K 5 | M_clip_model | MultilingualCLIP | 560 M 6 | M_clip_model.transformer | XLMRobertaModel | 559 M 7 | M_clip_model.LinearTransformation | Linear | 787 K 8 | visual_decoder | Decoder | 9.8 M 9 | visual_decoder.layers | ModuleList | 9.5 M 10 | visual_decoder.text_embed | TokenEmbedding | 178 K 11 | visual_decoder.norm | LayerNorm | 1.5 K 12 | visual_decoder.dropout | Dropout | 0
13 | visual_decoder.head | Linear | 176 K 14 | cross_decoder | Decoder | 9.8 M 15 | cross_decoder.layers | ModuleList | 9.5 M 16 | cross_decoder.text_embed | TokenEmbedding | 178 K 17 | cross_decoder.norm | LayerNorm | 1.5 K 18 | cross_decoder.dropout | Dropout | 0
19 | cross_decoder.head | Linear | 176 K

675 M Trainable params 332 M Non-trainable params 1.0 B Total params 4,031.815 Total estimated model params size (MB) [dataset] mean (0.48145466, 0.4578275, 0.40821073), std (0.26862954, 0.26130258, 0.27577711) Using 16bit native Automatic Mixed Precision (AMP) GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:117: UserWarning: When using Trainer(accumulate_grad_batches != 1) and overriding LightningModule.optimizer_{step,zero_grad}, the hooks will not be called on every batch (rather, they are called on every optimization step). rank_zero_warn( LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [VL4STR] The length of encoder params with and without weight decay is 259 and 479, respectively. [VL4STR] The length of decoder params with and without weight decay is 14 and 38, respectively. Loading train_dataloader to estimate number of stepping batches. dataset root: /content/drive/MyDrive/clip4str/dataset/str_dataset/train/real lmdb: ArT num samples: 34984 lmdb: The number of training samples is 34984 /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Error executing job with overrides: [] Traceback (most recent call last): File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 145, in main() File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra _run_app( File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 452, in _run_app run_and_report( File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 216, in run_and_report raise ex File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report return func() File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in lambda: hydra.run( File "/usr/local/lib/python3.10/dist-packages/hydra/internal/hydra.py", line 132, in run = ret.return_value File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 104, in main trainer.fit(model, datamodule=datamodule, ckpt_path=config.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1217, in _run self.strategy.setup(self) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/single_device.py", line 72, in setup super().setup(trainer) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 139, in setup self.setup_optimizers(trainer) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 128, in setup_optimizers self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers( File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers _validate_scheduler_api(lr_scheduler_configs, model) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 350, in _validate_scheduler_api raise MisconfigurationException( pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr scheduler OneCycleLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.

I can not see any problem in OneCycleLR, do you have any suggestions for me with this matter? Is it a problem of package version?

gioivuathoi commented 9 months ago

Found this: Error in lr scheduler after upgrade torch 2. I think the reason is I'm using torch 2.0.1, so the error can be fixed by downgrade torch <2.0 or fix source code of lightning pytorch 1.6.5 like in the link. P/S: the bug has been fixed for new versions of lightning pytorch

TruongNoDame commented 9 months ago

Hi @gioivuathoi , I am also trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 226 tokens) (use GPU of server) but I don't know which version of Multilngual_CLIP to use, if possible, can you send me the link or the name of that version. I would appreciate it if you told me what I should fix so that it matches the version? Example: number of charsets, code in strhub/models/vl_str/systems.py and other files such as you mentioned. I hope that you can respond to my wishes. Thanks, Have a nice day!

VamosC / CLIP4STR

The provided lr scheduler `OneCycleLR` doesn't follow PyTorch's LRScheduler API #4

The dimension of the visual decoder is 768. Len of Tokenizer 232 Done creating model! | Name | Type | Params