plutols commented 2 years ago

self.model.fit_generator(data_generator.generator(batch_size = self.batch_size,validation = False), validation_data = data_generator.generator(batch_size =self.batch_size,validation = True), epochs = self.max_epochs, steps_per_epoch = data_generator.train_length//self.batch_size, validation_steps = self.batch_size,

use_multiprocessing=True,

                             callbacks=[checkpointer, reduce_lr, csv_logger, early_stopping])

when I set use_multiprocessing=True, then the train can not start,but when I set use_multiprocessing=False，then the train speed is very low. any idea I can use multiprocessing

Le-Xiaohuai-speech commented 2 years ago

I have found that deadlock happens when use_multiprocessing = True. Using Dastaset from Pytorch to get a data generator may be a better choice if you want to load data in parallel.

plutols commented 2 years ago

I use keras.utils.Sequence, and now it can load data in parallel, but the train speed is still slow,about 15s/step, batch_size=8. I think it may be the computational complexity of DprnnBlock is too high, Another, the GPU memory utilization is very low, only 151M. Have you any idea I can speed up

Le-Xiaohuai-speech commented 2 years ago

Check the CPU and the GPU usage. Maybe there is something wrong with your Tensorflow and the CUDA is unavailable. Check the versions of tf and keras. Replace the LSTM with CUDNNLSTM can speed up the training.

---Original--- From: @.> Date: Thu, Mar 31, 2022 19:27 PM To: @.>; Cc: @.**@.>; Subject: Re: [Le-Xiaohuai-speech/DPCRN_DNS3] why we can not set use_multiprocessing=True (Issue #16)

I use keras.utils.Sequence, and now it can load data in parallel, but the train speed is still slow,about 15s/step, batch_size=8. I think it may be the computational complexity of DprnnBlock is too high, Another, the GPU memory utilization is very low, only 151M. Have you any idea I can speed up

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

plutols commented 2 years ago

what is your train speed? and how much your GPU memory utilization

Le-Xiaohuai-speech commented 2 years ago

1s / batch, 12 Gb

---Original--- From: @.> Date: Thu, Mar 31, 2022 19:42 PM To: @.>; Cc: @.**@.>; Subject: Re: [Le-Xiaohuai-speech/DPCRN_DNS3] why we can not setuse_multiprocessing=True (Issue #16)

what is your train speed? and how much your GPU memory utilization

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

plutols commented 2 years ago

oh my god, my tensorflow version is 1.15.0，and my cuda is 11.4. so I should upgrate my tensorflow?

Le-Xiaohuai-speech commented 2 years ago

you can update the Tensorflow to 2.X and the training step still works. if you want to use tf 1.X on CUDA 11, install the nvidia-tensorflow by：

pip install --upgrade pip pip install nvidia-pyindex pip install nvidia-tensorflow[horovod] pip install nvidia-tensorboard==1.15

---------------- 原始邮件 ------------------ 发件人: "Le-Xiaohuai-speech/DPCRN_DNS3" @.>; 发送时间: 2022年3月31日(星期四) 晚上7:47 @.>; @.**@.>; 主题: Re: [Le-Xiaohuai-speech/DPCRN_DNS3] why we can not set use_multiprocessing=True (Issue #16)

oh my god, my tensorflow version is 1.15.0，and my cuda is 11.4. so I should upgrate my tensorflow?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

plutols commented 2 years ago

it works, thanks!

rohithmars commented 1 year ago

@plutols @Le-Xiaohuai-speech i am curious how to use keras.utils.Sequence helps to use multiprocessing. When I tried it, the training cannot start. It seems to be stuck after displaying epoch 1/200

Could you please tell me how you used keras.utils.Sequence?

Le-Xiaohuai-speech / DPCRN_DNS3

why we can not set use_multiprocessing=True #16

use_multiprocessing=True,