Closed vineelpratap closed 6 years ago
@vineelpratap can you try doing the following ?
1) af::setMemStepSize(10 * (1 << 20))
before the benchmarking loop.
2) Try running out.eval()
Also do you want to share the changes you've made to arrayfire-ml ?
@pavanky - Yes, af::setMemStepSize(10 * (1 << 20))
fixes the issue :)
Yes, we plan to share the code with all the changes to arrayfire-ml repo soon (< 1 month).
I have extended arrayfire-ml with CuDNN bindings and was running benchmarks to compare with convnet benchmarks from https://github.com/soumith/convnet-benchmarks/.
The benchmarks are run in the following way
// Define the Network
I was able to match the performance with torch7 CuDNN bindings if I make the input size always constant. However, if I pass random input between the sizes
[lo, hi]
the avg. performance if actually worse than sending input of sizehi
always.You can notice the spike at regular intervals which is increasing the avg. time taken.
Note that all the buffers, arrays are initialized using af::array(..) constructor (No
CudaMallocs
used) and all Cudnn Operation are placed on Arrayfire's Cuda Stream on the device.I was wondering if the spikes at regular intervals strike something to you. Could the continuous memory allocations (of different sizes) be optimized with Arrayfire memory manager ?
Thanks in advance !