Closed gioivuathoi closed 9 months ago
Found this: Error in lr scheduler after upgrade torch 2. I think the reason is I'm using torch 2.0.1, so the error can be fixed by downgrade torch <2.0 or fix source code of lightning pytorch 1.6.5 like in the link. P/S: the bug has been fixed for new versions of lightning pytorch
Hi @gioivuathoi , I am also trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 226 tokens) (use GPU of server) but I don't know which version of Multilngual_CLIP to use, if possible, can you send me the link or the name of that version. I would appreciate it if you told me what I should fix so that it matches the version? Example: number of charsets, code in strhub/models/vl_str/systems.py and other files such as you mentioned. I hope that you can respond to my wishes. Thanks, Have a nice day!
Thank you for your great work! I am trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 229 tokens) (use Google Colab) I have changed charset, code in strhub/models/vl_str/systems.py and other files so that I can use Text_encoder from Multilingual_CLIP for Vietnamese Now I am getting an error for Learning rate scheduler as following:
The dimension of the visual decoder is 768. Len of Tokenizer 232 Done creating model! | Name | Type | Params
0 | clip_model | CLIP | 427 M 1 | clip_model.visual | VisionTransformer | 303 M 2 | clip_model.transformer | Transformer | 85.1 M 3 | clip_model.token_embedding | Embedding | 37.9 M 4 | clip_model.ln_final | LayerNorm | 1.5 K 5 | M_clip_model | MultilingualCLIP | 560 M 6 | M_clip_model.transformer | XLMRobertaModel | 559 M 7 | M_clip_model.LinearTransformation | Linear | 787 K 8 | visual_decoder | Decoder | 9.8 M 9 | visual_decoder.layers | ModuleList | 9.5 M 10 | visual_decoder.text_embed | TokenEmbedding | 178 K 11 | visual_decoder.norm | LayerNorm | 1.5 K 12 | visual_decoder.dropout | Dropout | 0
13 | visual_decoder.head | Linear | 176 K 14 | cross_decoder | Decoder | 9.8 M 15 | cross_decoder.layers | ModuleList | 9.5 M 16 | cross_decoder.text_embed | TokenEmbedding | 178 K 17 | cross_decoder.norm | LayerNorm | 1.5 K 18 | cross_decoder.dropout | Dropout | 0
19 | cross_decoder.head | Linear | 176 K
675 M Trainable params 332 M Non-trainable params 1.0 B Total params 4,031.815 Total estimated model params size (MB) [dataset] mean (0.48145466, 0.4578275, 0.40821073), std (0.26862954, 0.26130258, 0.27577711) Using 16bit native Automatic Mixed Precision (AMP) GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:117: UserWarning: When using
main()
File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 216, in run_and_report
raise ex
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/internal/hydra.py", line 132, in run
= ret.return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 104, in main
trainer.fit(model, datamodule=datamodule, ckpt_path=config.ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1217, in _run
self.strategy.setup(self)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/single_device.py", line 72, in setup
super().setup(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 139, in setup
self.setup_optimizers(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 128, in setup_optimizers
self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
_validate_scheduler_api(lr_scheduler_configs, model)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 350, in _validate_scheduler_api
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr scheduler
Trainer(accumulate_grad_batches != 1)
and overridingLightningModule.optimizer_{step,zero_grad}
, the hooks will not be called on every batch (rather, they are called on every optimization step). rank_zero_warn( LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [VL4STR] The length of encoder params with and without weight decay is 259 and 479, respectively. [VL4STR] The length of decoder params with and without weight decay is 14 and 38, respectively. Loadingtrain_dataloader
to estimate number of stepping batches. dataset root: /content/drive/MyDrive/clip4str/dataset/str_dataset/train/real lmdb: ArT num samples: 34984 lmdb: The number of training samples is 34984 /usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Error executing job with overrides: [] Traceback (most recent call last): File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 145, inOneCycleLR
doesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step
hook with your own logic if you are using a custom LR scheduler.I can not see any problem in OneCycleLR, do you have any suggestions for me with this matter? Is it a problem of package version?