Closed infrontofme closed 5 years ago
Hi,
In my implementation, I multiply the weight tensor with a binary mask, which should not significantly affect the training speed. Could you provide some description about how you implement the dropping operator? Thanks!
Thank you for your reply.
I also multiply the weight tensor with a binary mask. I found there are some differences in Calculation Graph between tensorflow and pytorch. The graph of tensorflow is static. When I use the loop in my code, it will cause the calculation graph to become larger and the calculation to be slower. Then, I fixed my code, and the problem was solved.
your work is awesome!
Yes, do not use for loop during training in any deep learning framework. That will cause frequently starting and stoping cuda kernel, which will largely slow down the speed. Glad to hear that the problem has been fixed!
Hi, I am reproducing your work in tensorflow, but I found that dropping during training has taken a lot of time. I would like to ask if you have encountered such a problem. What do you think might be the reason?