I am trying to run PRNN on a GeForce RTX 2080(46 SM, 7.7 compute).
I've tried the following tile sizes,
TileConfig<24, 1152, 1152, 192, 288, 6, 36, direction, T>
TileConfig<32, 1024, 1024, 128, 256, 4, 32, direction, T>
TileConfig<32, 1024, 1024, 64, 512, 1, 32, direction, T>
TileConfig<40, 640, 640, 80, 128, 5, 4, direction, T>
Running benchmark using any of these with batchsize=4, timesteps=20, and layer sizes max for each tile configuration, the fastest I can get is 0.00478542 TFLOPS/s in the forward run.
Are the tile sizes inappropriate or is the issue something else.
Great work putting this together!
I am trying to run PRNN on a GeForce RTX 2080(46 SM, 7.7 compute). I've tried the following tile sizes, TileConfig<24, 1152, 1152, 192, 288, 6, 36, direction, T> TileConfig<32, 1024, 1024, 128, 256, 4, 32, direction, T> TileConfig<32, 1024, 1024, 64, 512, 1, 32, direction, T> TileConfig<40, 640, 640, 80, 128, 5, 4, direction, T>
Running benchmark using any of these with batchsize=4, timesteps=20, and layer sizes max for each tile configuration, the fastest I can get is 0.00478542 TFLOPS/s in the forward run.
Are the tile sizes inappropriate or is the issue something else.
Thank you.