Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.99k stars 3.35k forks source link

Issue with scheduler referencing objects #2309

Closed Geeks-Sid closed 4 years ago

Geeks-Sid commented 4 years ago

🐛 Bug

Scheduler not working properly with optimizer Optimizer used: Adam Scheduler used: CosineAnnealingWarmRestarts

Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
Traceback (most recent call last):
  File "colon_seg_run", line 110, in <module>
    trainer_main.train_network(hparams)
  File "/home/siddhesh/Work/Projects/Colon-Seg/Colon_Seg/trainer/trainer_main.py", line 125, in train_network
    trainer.fit(model)
  File "/home/siddhesh/anaconda3/envs/histotorch_16bit/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 912, in fit
    self.dp_train(model)
  File "/home/siddhesh/anaconda3/envs/histotorch_16bit/lib/python3.6/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 246, in dp_train
    self.reinit_scheduler_properties(optimizers, self.lr_schedulers)
  File "/home/siddhesh/anaconda3/envs/histotorch_16bit/lib/python3.6/site-packages/pytorch_lightning/trainer/optimizers.py", line 122, in reinit_scheduler_properties
    scheduler.__class__.__mro__[idx].__init__(scheduler, optimizer)
  File "/home/siddhesh/anaconda3/envs/histotorch_16bit/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 74, in __init__
    self.optimizer.step = with_counter(self.optimizer.step)
  File "/home/siddhesh/anaconda3/envs/histotorch_16bit/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 56, in with_counter
    instance_ref = weakref.ref(method.__self__)
AttributeError: 'function' object has no attribute '__self__'

To Reproduce

Steps to reproduce the behavior: Train a lightning network with setting configure_optimizers as following

    def configure_optimizers(self):
        # Setting up the optimizer
        optimizer = fetch_optimizer(self.hparams.optimizer,
                                    self.hparams.learning_rate,
                                    self.model)
        lr_scheduler = {'scheduler' : CosineAnnealingWarmRestarts(optimizer,
                                                                  T_0=5,
                                                                  T_mult=1,
                                                                  eta_min=1e-6,
                                                                  last_epoch=-1),
                        'name':'cosine-annealing-warm-restarts'}
        return [optimizer], [lr_scheduler]

Code sample

Expected behavior

Should be able to run normally

Environment

Additional context

github-actions[bot] commented 4 years ago

Hi! thanks for your contribution!, great first issue!

rohitgr7 commented 4 years ago

Can you check if apex is installed correctly if you are using it? Similar issues: https://github.com/NVIDIA/apex/issues/552 https://github.com/bshall/ZeroSpeech/issues/4

williamFalcon commented 4 years ago

Looks like this is an Apex issue? Will reopen if it still remains

ethanjperez commented 4 years ago

@williamFalcon I'm getting this error after just upgrading PL versions: 0.7.6 -> 0.8.4. Apex was working fine before the upgrade

[Edit] The PL upgrade also upgraded pytorch -> 1.5.1, which maybe is causing the issue (mismatched version with my apex installation I think)

williamFalcon commented 4 years ago

can you install master? we fixed there

ethanjperez commented 4 years ago

I rolled back to 0.8.1 (from 0.8.4) and that worked for me.

Side note - is there a way to not have PL to upgrade by pytorch version from 1.4.0 -> 1.5.1 when I upgrade PL? Or was that required? The upgrade is causing some dependency headaches

williamFalcon commented 4 years ago

it really shouldn't upgrade... @Borda mind checking?

ethanjperez commented 4 years ago

I ran pip install -I pytorch-lightning==0.8.1 and got the below (Maybe I shouldn't have used the -I or --ignore-installed flag?):

Collecting pytorch-lightning==0.8.1
  Downloading pytorch_lightning-0.8.1-py3-none-any.whl (293 kB)
     |████████████████████████████████| 293 kB 269 kB/s
Collecting numpy>=1.15
  Downloading numpy-1.19.0-cp37-cp37m-macosx_10_9_x86_64.whl (15.3 MB)
     |████████████████████████████████| 15.3 MB 5.2 MB/s
Collecting tensorboard>=1.14
  Downloading tensorboard-2.2.2-py3-none-any.whl (3.0 MB)
     |████████████████████████████████| 3.0 MB 18.1 MB/s
Collecting torch>=1.3
  Downloading torch-1.5.1-cp37-none-macosx_10_9_x86_64.whl (80.5 MB)
     |████████████████████████████████| 80.5 MB 33.7 MB/s
Processing /Users/ethanperez/Library/Caches/pip/wheels/56/b0/fe/4410d17b32f1f0c3cf54cdfb2bc04d7b4b8f4ae377e2229ba0/future-0.18.2-py3-none-any.whl
Collecting PyYAML>=5.1
  Downloading PyYAML-5.3.1.tar.gz (269 kB)
     |████████████████████████████████| 269 kB 13.6 MB/s
Collecting tqdm>=4.41.0
  Downloading tqdm-4.47.0-py2.py3-none-any.whl (66 kB)
     |████████████████████████████████| 66 kB 11.8 MB/s
Collecting google-auth<2,>=1.6.3
  Downloading google_auth-1.18.0-py2.py3-none-any.whl (90 kB)
     |████████████████████████████████| 90 kB 20.4 MB/s
Collecting setuptools>=41.0.0
  Downloading setuptools-49.1.0-py3-none-any.whl (789 kB)
     |████████████████████████████████| 789 kB 31.6 MB/s
Collecting six>=1.10.0
  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Using cached google_auth_oauthlib-0.4.1-py2.py3-none-any.whl (18 kB)
Collecting wheel>=0.26; python_version >= "3"
  Using cached wheel-0.34.2-py2.py3-none-any.whl (26 kB)
Collecting grpcio>=1.24.3
  Downloading grpcio-1.30.0-cp37-cp37m-macosx_10_9_x86_64.whl (2.8 MB)
     |████████████████████████████████| 2.8 MB 23.9 MB/s
Collecting protobuf>=3.6.0
  Downloading protobuf-3.12.2-cp37-cp37m-macosx_10_9_x86_64.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 28.2 MB/s
Processing /Users/ethanperez/Library/Caches/pip/wheels/8e/28/49/fad4e7f0b9a1227708cbbee4487ac8558a7334849cb81c813d/absl_py-0.9.0-cp37-none-any.whl
Collecting markdown>=2.6.8
  Downloading Markdown-3.2.2-py3-none-any.whl (88 kB)
     |████████████████████████████████| 88 kB 4.8 MB/s
Collecting werkzeug>=0.11.15
  Downloading Werkzeug-1.0.1-py2.py3-none-any.whl (298 kB)
     |████████████████████████████████| 298 kB 18.4 MB/s
Collecting requests<3,>=2.21.0
  Downloading requests-2.24.0-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 914 kB/s
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.7.0-py3-none-any.whl (779 kB)
     |████████████████████████████████| 779 kB 49.0 MB/s
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting cachetools<5.0,>=2.0.0
  Downloading cachetools-4.1.1-py3-none-any.whl (10 kB)
Collecting rsa<5,>=3.1.4; python_version >= "3"
  Downloading rsa-4.6-py3-none-any.whl (47 kB)
     |████████████████████████████████| 47 kB 9.1 MB/s
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting importlib-metadata; python_version < "3.8"
  Downloading importlib_metadata-1.7.0-py2.py3-none-any.whl (31 kB)
Collecting chardet<4,>=3.0.2
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
  Downloading urllib3-1.25.9-py2.py3-none-any.whl (126 kB)
     |████████████████████████████████| 126 kB 22.1 MB/s
Collecting certifi>=2017.4.17
  Downloading certifi-2020.6.20-py2.py3-none-any.whl (156 kB)
     |████████████████████████████████| 156 kB 20.1 MB/s
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 3.5 MB/s
Collecting pyasn1<0.5.0,>=0.4.6
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting oauthlib>=3.0.0
  Using cached oauthlib-3.1.0-py2.py3-none-any.whl (147 kB)
Collecting zipp>=0.5
  Using cached zipp-3.1.0-py3-none-any.whl (4.9 kB)
Building wheels for collected packages: PyYAML
  Building wheel for PyYAML (setup.py) ... done
  Created wheel for PyYAML: filename=PyYAML-5.3.1-cp37-cp37m-macosx_10_9_x86_64.whl size=44625 sha256=09dd6e05082204c316cbc5ce463b2a72c1bb69138ecb6cb4da3237e8751baebf
  Stored in directory: /Users/ethanperez/Library/Caches/pip/wheels/5e/03/1e/e1e954795d6f35dfc7b637fe2277bff021303bd9570ecea653
Successfully built PyYAML
Installing collected packages: numpy, pyasn1, pyasn1-modules, setuptools, cachetools, rsa, six, google-auth, chardet, urllib3, certifi, idna, requests, oauthlib, requests-oauthlib, google-auth-oauthlib, wheel, grpcio, protobuf, absl-py, zipp, importlib-metadata, markdown, werkzeug, tensorboard-plugin-wit, tensorboard, future, torch, PyYAML, tqdm, pytorch-lightning
Successfully installed PyYAML-5.3.1 absl-py-0.9.0 cachetools-4.1.1 certifi-2020.6.20 chardet-3.0.4 future-0.18.2 google-auth-1.18.0 google-auth-oauthlib-0.4.1 grpcio-1.30.0 idna-2.10 importlib-metadata-1.7.0 markdown-3.2.2 numpy-1.19.0 oauthlib-3.1.0 protobuf-3.12.2 pyasn1-0.4.8 pyasn1-modules-0.2.8 pytorch-lightning-0.8.4 requests-2.24.0 requests-oauthlib-1.3.0 rsa-4.6 setuptools-49.1.0 six-1.15.0 tensorboard-2.2.2 tensorboard-plugin-wit-1.7.0 torch-1.5.1 tqdm-4.47.0 urllib3-1.25.9 werkzeug-1.0.1 wheel-0.34.2 zipp-3.1.0
Borda commented 4 years ago

@ethanjperez the install log above looks correct... The easiest way is to specify versions for instalation if you need or a range with min and max...