Thank you very much for your excellent work. I am now encountering this problem while training my model in a virtual environment.
The passed generator was created on 'cpu' even though a tensor on cuda:0 was expected. Tensors will be created on 'cpu' and then moved to cuda:0. Note that one can probably slighly speed up this function by passing a generator that was created on the cuda:0 device.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:24<00:00, 1.65it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 33.16it/s]
Moviepy - Building video ./exp_output/stage2/validation/1_1_1.mp4.
MoviePy - Writing audio in 1_1_1TEMP_MPY_wvf_snd.mp4
MoviePy - Done.
Moviepy - Writing video ./exp_output/stage2/validation/1_1_1.mp4
Moviepy - Done !
Moviepy - video ready ./exp_output/stage2/validation/1_1_1.mp4
Steps: 0%| | 1/3000 [06:10<6:40:15, 8.01s/it, lr=1e-5, step_loss=0.271, td=3.17s][2024-08-10 10:01:34,981] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648, reducing to 1073741824
Steps: 0%| | 2/3000 [06:20<185:23:13, 222.61s/it, lr=1e-5, step_loss=0.258, td=4.30s][2024-08-10 10:01:44,611] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824, reducing to 536870912
Steps: 0%|▏ | 3/3000 [06:30<104:21:45, 125.36s/it, lr=1e-5, step_loss=0.371, td=3.83s][2024-08-10 10:01:53,991] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912, reducing to 268435456
Steps: 0%|▏ | 4/3000 [06:39<66:13:19, 79.57s/it, lr=1e-5, step_loss=0.374, td=3.64s][2024-08-10 10:02:03,559] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456, reducing to 134217728
Steps: 0%|▎ | 5/3000 [06:49<45:11:54, 54.33s/it, lr=1e-5, step_loss=0.373, td=4.09s][2024-08-10 10:02:13,085] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728, reducing to 67108864
Steps: 0%|▎ | 6/3000 [06:58<32:30:51, 39.10s/it, lr=1e-5, step_loss=0.262, td=3.67s][2024-08-10 10:02:21,432] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864, reducing to 33554432
Steps: 0%|▍ | 7/3000 [07:07<24:08:47, 29.04s/it, lr=1e-5, step_loss=0.259, td=3.89s][2024-08-10 10:02:31,178] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432, reducing to 16777216
Steps: 0%|▍ | 8/3000 [07:17<19:01:57, 22.90s/it, lr=1e-5, step_loss=0.297, td=3.85s][2024-08-10 10:02:40,832] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216, reducing to 8388608
Steps: 0%|▌ | 9/3000 [07:26<15:35:07, 18.76s/it, lr=1e-5, step_loss=0.284, td=3.83s][2024-08-10 10:02:50,562] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608, reducing to 4194304
Steps: 0%|▌ | 10/3000 [07:36<13:15:55, 15.97s/it, lr=1e-5, step_loss=0.316, td=3.91s][2024-08-10 10:03:00,072] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304, reducing to 2097152
Steps: 0%|▋ | 11/3000 [07:45<11:37:07, 13.99s/it, lr=1e-5, step_loss=0.243, td=3.69s]
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined
c
Thank you very much for your excellent work. I am now encountering this problem while training my model in a virtual environment.
The passed generator was created on 'cpu' even though a tensor on cuda:0 was expected. Tensors will be created on 'cpu' and then moved to cuda:0. Note that one can probably slighly speed up this function by passing a generator that was created on the cuda:0 device. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:24<00:00, 1.65it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 33.16it/s] Moviepy - Building video ./exp_output/stage2/validation/1_1_1.mp4.
MoviePy - Writing audio in 1_1_1TEMP_MPY_wvf_snd.mp4
MoviePy - Done.
Moviepy - Writing video ./exp_output/stage2/validation/1_1_1.mp4
Moviepy - Done !
Moviepy - video ready ./exp_output/stage2/validation/1_1_1.mp4
Steps: 0%| | 1/3000 [06:10<6:40:15, 8.01s/it, lr=1e-5, step_loss=0.271, td=3.17s][2024-08-10 10:01:34,981] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648, reducing to 1073741824 Steps: 0%| | 2/3000 [06:20<185:23:13, 222.61s/it, lr=1e-5, step_loss=0.258, td=4.30s][2024-08-10 10:01:44,611] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824, reducing to 536870912 Steps: 0%|▏ | 3/3000 [06:30<104:21:45, 125.36s/it, lr=1e-5, step_loss=0.371, td=3.83s][2024-08-10 10:01:53,991] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912, reducing to 268435456 Steps: 0%|▏ | 4/3000 [06:39<66:13:19, 79.57s/it, lr=1e-5, step_loss=0.374, td=3.64s][2024-08-10 10:02:03,559] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456, reducing to 134217728 Steps: 0%|▎ | 5/3000 [06:49<45:11:54, 54.33s/it, lr=1e-5, step_loss=0.373, td=4.09s][2024-08-10 10:02:13,085] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728, reducing to 67108864 Steps: 0%|▎ | 6/3000 [06:58<32:30:51, 39.10s/it, lr=1e-5, step_loss=0.262, td=3.67s][2024-08-10 10:02:21,432] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864, reducing to 33554432 Steps: 0%|▍ | 7/3000 [07:07<24:08:47, 29.04s/it, lr=1e-5, step_loss=0.259, td=3.89s][2024-08-10 10:02:31,178] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432, reducing to 16777216 Steps: 0%|▍ | 8/3000 [07:17<19:01:57, 22.90s/it, lr=1e-5, step_loss=0.297, td=3.85s][2024-08-10 10:02:40,832] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216, reducing to 8388608 Steps: 0%|▌ | 9/3000 [07:26<15:35:07, 18.76s/it, lr=1e-5, step_loss=0.284, td=3.83s][2024-08-10 10:02:50,562] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608, reducing to 4194304 Steps: 0%|▌ | 10/3000 [07:36<13:15:55, 15.97s/it, lr=1e-5, step_loss=0.316, td=3.91s][2024-08-10 10:03:00,072] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304, reducing to 2097152 Steps: 0%|▋ | 11/3000 [07:45<11:37:07, 13.99s/it, lr=1e-5, step_loss=0.243, td=3.69s] ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined ERROR:root:Failed to execute the training process: name 'str2optimizer8bit_blockwise' is not defined c