TF implementation of RNN-scratch runs much slower than the other implementations

d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

https://D2L.ai

Other

24.12k stars 4.38k forks source link

TF implementation of RNN-scratch runs much slower than the other implementations #1451

Open astonzhang opened 4 years ago

astonzhang commented 4 years ago

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_recurrent-neural-networks/rnn-scratch.html

On the same machine, TF runs for about 9mins, MX/PT runs for between 3 and 5 mins.

@abhinavsp0730, can you take a look? Thanks.

abhinavsp0730 commented 4 years ago

Hi, @astonzhang thanks good catch. I've raised the PR to fix the issue. But the notebooks of TF runs slower than mxnet/pt because:

throughout the Implementation we're using one device strategy. It means the model is training in one GPU/CPU
Till now we haven't used @tf.function decorator. By doing this we'll get 15x speed in executing python functions Thanks

astonzhang commented 4 years ago

Thanks. RNN scratch does trains in one GPU for all the frameworks so I guess this is probably not the root cause. Maybe @terrytangyuan may help you with your PR on this.

abhinavsp0730 commented 4 years ago

@astonzhang I guess it doesn't because here http://d2l.ai/chapter_convolutional-neural-networks/lenet.html (train_ch6) we've to explicitly define one device strategy in order to utilize the gpu. Thanks.