Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.03k stars 3.36k forks source link

tqdm 'NoneType' object is not iterable with k-fold cross-validation #2614

Closed jinhonglu closed 3 years ago

jinhonglu commented 4 years ago

🐛 Bug

I tried to run k-fold cross-validation, this gives me a tqdm 'NoneType' object is not iterable on a Linux-based server, but not on a Macbook. I try to keep the packages in both sides to be same, but it still has the same error in the Linux-based server

Code sample


import pytorch_lightning as pl
import models
from sklearn.model_selection import KFold
from torch.utils.data import Subset, DataLoader
import numpy as np
import helper
import torch.nn as tn

# Loading the dataset
input_motion = np.load('processed_data/Train_input_motion_65dim_delta1_delta2_scaleNorm.npz',
                       allow_pickle=True)['clips']
target_motion = np.load('processed_data/Train_target_motion_45dim_delta1_delta2_scaleNorm.npz',
                        allow_pickle=True)['clips']
audio = np.load('processed_data/Train_audio_wav_500dim_scaleNorm.npz',
                allow_pickle=True)['audio_data']

input_motion = [data for file in input_motion for data in file]
target_motion = [data for file in target_motion for data in file]
audio = [data for file in audio for data in file]

motion_data = helper.MotionEnDeDataset(input_motion, target_motion, audio, 1)
MotionEnDe = models.MotionEncoderDecoder(en_in_dim=en_in_dim,
                                         en_out_dim=en_out_dim,
                                         en_h_depth=en_h_depth,
                                         en_h_dims=en_h_dims,
                                         en_h_acts=en_h_acts,
                                         en_h_bns=en_h_bns,
                                         en_h_dos=en_h_dos,
                                         en_out_act=en_out_act,

                                         de_in_dim=de_in_dim,
                                         de_out_dim=de_out_dim,
                                         de_h_depth=de_h_depth,
                                         de_h_dims=de_h_dims,
                                         de_h_acts=de_h_acts,
                                         de_h_bns=de_h_bns,
                                         de_h_dos=de_h_dos,
                                         de_out_act=de_out_act)
trainer = pl.Trainer(default_root_dir='./Model/test5/', max_epochs=1, fast_dev_run=True)
kf = KFold(n_splits=5, random_state=48, shuffle=True).split([x for x in range(len(input_motion))])

for train_index, valid_index in kf:
    train_data = Subset(motion_data, train_index)
    valid_data = Subset(motion_data, valid_index)
    trainer.fit(MotionEnDe,
                train_dataloader=DataLoader(train_data, batch_size=8192, shuffle=True, num_workers=4),
                val_dataloaders=DataLoader(valid_data, batch_size=8192, num_workers=4))

Trackback

  | Name    | Type          | Params
------------------------------------------
0 | encoder | MotionEncoder | 515 K
1 | decoder | MotionDecoder | 467 K
Epoch 1: 100%|#################| 2/2 [00:02<00:00,  1.25s/it, loss=-1.985, v_num=4, Motion_Encoder_loss=-5.13, Motion_Decoder_loss=1.16]

  | Name    | Type          | Params
------------------------------------------
0 | encoder | MotionEncoder | 515 K
1 | decoder | MotionDecoder | 467 K
Epoch 1: 100%|################| 2/2 [00:01<00:00,  1.02it/s, loss=-2.081, v_num=4, Motion_Encoder_loss=-5.34, Motion_Decoder_loss=0.988]
  | Name    | Type          | Params
------------------------------------------
0 | encoder | MotionEncoder | 515 K
1 | decoder | MotionDecoder | 467 K
Epoch 1: 100%|################| 2/2 [00:02<00:00,  1.03s/it, loss=-2.081, v_num=4, Motion_Encoder_loss=-5.34, Motion_Decoder_loss=0.988]
Epoch 1: 100%|################| 2/2 [00:02<00:00,  1.07s/it, loss=-2.163, v_num=4, Motion_Encoder_loss=-5.54, Motion_Decoder_loss=0.888]
  | Name    | Type          | Params
------------------------------------------
0 | encoder | MotionEncoder | 515 K
1 | decoder | MotionDecoder | 467 K
Epoch 1: 100%|################| 2/2 [00:02<00:00,  1.12s/it, loss=-2.163, v_num=4, Motion_Encoder_loss=-5.54, Motion_Decoder_loss=0.888]
Epoch 1: 100%|################| 2/2 [00:02<00:00,  1.03s/it, loss=-2.223, v_num=4, Motion_Encoder_loss=-5.62, Motion_Decoder_loss=0.808]
  | Name    | Type          | Params
------------------------------------------
0 | encoder | MotionEncoder | 515 K
1 | decoder | MotionDecoder | 467 K
Epoch 1: 100%|################| 2/2 [00:02<00:00,  1.08s/it, loss=-2.223, v_num=4, Motion_Encoder_loss=-5.62, Motion_Decoder_loss=0.808]
Epoch 1: 100%|################| 2/2 [00:02<00:00,  1.05s/it, loss=-2.279, v_num=4, Motion_Encoder_loss=-5.77, Motion_Decoder_loss=0.769]Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/Desktop/py3_env/lib/python3.6/site-packages/tqdm/std.py", line 1086, in __del__
  File "/Desktop/py3_env/lib/python3.6/site-packages/tqdm/std.py", line 1293, in close
  File "/Desktop/py3_env/lib/python3.6/site-packages/tqdm/std.py", line 1471, in display
  File "/Desktop/py3_env/lib/python3.6/site-packages/tqdm/std.py", line 1089, in __repr__
  File "/Desktop/py3_env/lib/python3.6/site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: 'NoneType' object is not iterable

Expected behavior

It should save the model afterwards

Environment

* CUDA:
    - GPU:
        - GeForce RTX 2080 Ti
        - GeForce RTX 2080 Ti
        - GeForce RTX 2080 Ti
        - GeForce RTX 2080 Ti
        - GeForce RTX 2080 Ti
        - GeForce RTX 2080 Ti
    - available:         True
    - version:           10.2
* Packages:
    - numpy:             1.18.1
    - pyTorch_debug:     False
    - pyTorch_version:   1.5.0
    - pytorch-lightning: 0.8.4
    - tensorboard:       2.2.1
    - tqdm:              4.46.1
* System:
    - OS:                Linux
    - architecture:
        - 64bit
        - ELF
    - processor:         x86_64
    - python:            3.6.8
    - version:           #1 SMP Thu Nov 14 10:04:03 CST 2019

Additional context

I tried lightning 0.8.5 and tqdm 4.47 as well. All fails.

github-actions[bot] commented 4 years ago

Hi! thanks for your contribution!, great first issue!

awaelchli commented 4 years ago

Could it be that in one of your training or validation step/epoch_end you return a None value?

jinhonglu commented 4 years ago

Could it be that in one of your training or validation step/epoch_end you return a None value?

I don't think so. It is because it did not happen on my Mac and I had run multiple times on the server, all of them have this error.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

awaelchli commented 4 years ago

@jinhonglu Is there any further information you could provide? Sorry for the late request, but I would need a full code example to check if this is still an issue. Your code sample looks fine, so I cannot guess what the problem might be unless I can reproduce it. Again, sorry for the late response.

williamFalcon commented 3 years ago

side note... you never reset your model weights. So, it's not doing true cross validation. Instead it's keeping the same weights from all previous folds.

jinhonglu commented 3 years ago

@williamFalcon Thanks for pointing out.

@awaelchli That is basically the full code for that. I switched to the original pytorch usage for my project, it did not raise me this error and so there was no other more information that I could provide right now.