baidu-research / persistent-rnn

Fast Recurrent Networks Library
Apache License 2.0
573 stars 87 forks source link

Suitable tile size for GeForce RTX 2080 #19

Open Shikherneo2 opened 4 years ago

Shikherneo2 commented 4 years ago

Great work putting this together!

I am trying to run PRNN on a GeForce RTX 2080(46 SM, 7.7 compute). I've tried the following tile sizes, TileConfig<24, 1152, 1152, 192, 288, 6, 36, direction, T> TileConfig<32, 1024, 1024, 128, 256, 4, 32, direction, T> TileConfig<32, 1024, 1024, 64, 512, 1, 32, direction, T> TileConfig<40, 640, 640, 80, 128, 5, 4, direction, T>

Running benchmark using any of these with batchsize=4, timesteps=20, and layer sizes max for each tile configuration, the fastest I can get is 0.00478542 TFLOPS/s in the forward run.

Are the tile sizes inappropriate or is the issue something else.

Thank you.