baidu-research / DeepBench

Benchmarking Deep Learning operations on different hardware
Apache License 2.0
1.07k stars 239 forks source link

Update AMD and NVIDIA RNN benchmarks to use *RNNForwardInference instead of *RNNForwardTraining #117

Closed mattsinc closed 4 years ago

mattsinc commented 4 years ago

As discussed in #114 , the standalone AMD and NVIDIA inference passes for the RNN benchmark use RNNForwardTraining, which is only required when training is also going to be run. This change modifies the AMD and NVIDIA RNN benchmarks to use RNNForwardInference instead, which the cuDNN and MIOpen documentation indicate is the appropriate/sufficient call to use for inference-only passes.

I tested this locally, and they pass. They also usually see small improvements (5% or less for the ones I spot checked) by avoiding storing the intermediate data required for training.

For the AMD code, I needed to include rocBLAS in the Makefile path to get it to compile gemm when I was re-making everything. I suspect this is only required if rocBLAS is installed in a non-standard location (i.e., other than /opt/rocm), but I don't have the ability to test this so I included it as a separate commit in case others come across this problem. I can break this out into a separate pull request if people prefer that.

mattsinc commented 4 years ago

@sharannarang, @newshaa: I'm not able to add reviewers, just checking what the best next step is?