Open plutols opened 2 years ago
I have found that deadlock happens when use_multiprocessing = True. Using Dastaset from Pytorch to get a data generator may be a better choice if you want to load data in parallel.
I use keras.utils.Sequence, and now it can load data in parallel, but the train speed is still slow,about 15s/step, batch_size=8. I think it may be the computational complexity of DprnnBlock is too high, Another, the GPU memory utilization is very low, only 151M. Have you any idea I can speed up
Check the CPU and the GPU usage. Maybe there is something wrong with your Tensorflow and the CUDA is unavailable. Check the versions of tf and keras. Replace the LSTM with CUDNNLSTM can speed up the training.
---Original--- From: @.> Date: Thu, Mar 31, 2022 19:27 PM To: @.>; Cc: @.**@.>; Subject: Re: [Le-Xiaohuai-speech/DPCRN_DNS3] why we can not set use_multiprocessing=True (Issue #16)
I use keras.utils.Sequence, and now it can load data in parallel, but the train speed is still slow,about 15s/step, batch_size=8. I think it may be the computational complexity of DprnnBlock is too high, Another, the GPU memory utilization is very low, only 151M. Have you any idea I can speed up
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
what is your train speed? and how much your GPU memory utilization
1s / batch, 12 Gb
---Original--- From: @.> Date: Thu, Mar 31, 2022 19:42 PM To: @.>; Cc: @.**@.>; Subject: Re: [Le-Xiaohuai-speech/DPCRN_DNS3] why we can not setuse_multiprocessing=True (Issue #16)
what is your train speed? and how much your GPU memory utilization
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
oh my god, my tensorflow version is 1.15.0,and my cuda is 11.4. so I should upgrate my tensorflow?
you can update the Tensorflow to 2.X and the training step still works. if you want to use tf 1.X on CUDA 11, install the nvidia-tensorflow by:
pip install --upgrade pip pip install nvidia-pyindex pip install nvidia-tensorflow[horovod] pip install nvidia-tensorboard==1.15
---------------- 原始邮件 ------------------ 发件人: "Le-Xiaohuai-speech/DPCRN_DNS3" @.>; 发送时间: 2022年3月31日(星期四) 晚上7:47 @.>; @.**@.>; 主题: Re: [Le-Xiaohuai-speech/DPCRN_DNS3] why we can not set use_multiprocessing=True (Issue #16)
oh my god, my tensorflow version is 1.15.0,and my cuda is 11.4. so I should upgrate my tensorflow?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
it works, thanks!
@plutols @Le-Xiaohuai-speech i am curious how to use keras.utils.Sequence helps to use multiprocessing. When I tried it, the training cannot start. It seems to be stuck after displaying epoch 1/200
Could you please tell me how you used keras.utils.Sequence?
self.model.fit_generator(data_generator.generator(batch_size = self.batch_size,validation = False), validation_data = data_generator.generator(batch_size =self.batch_size,validation = True), epochs = self.max_epochs, steps_per_epoch = data_generator.train_length//self.batch_size, validation_steps = self.batch_size,
use_multiprocessing=True,
when I set use_multiprocessing=True, then the train can not start,but when I set use_multiprocessing=False,then the train speed is very low. any idea I can use multiprocessing