arrayfire / arrayfire-ml

ArrayFire's Machine Learning Library.
BSD 3-Clause "New" or "Revised" License
102 stars 23 forks source link

Degraded performance for variable input size #44

Closed vineelpratap closed 6 years ago

vineelpratap commented 6 years ago

I have extended arrayfire-ml with CuDNN bindings and was running benchmarks to compare with convnet benchmarks from https://github.com/soumith/convnet-benchmarks/.

The benchmarks are run in the following way

// Define the Network

screen shot 2018-06-18 at 11 36 41 am
// Benchmark code
for (int i = 0; i < ntimes; ++i) {
    af::sync();
    auto s = af::timer::start();
    input = <INPUT_INITIALIZATION USING AF::RANDU>
    auto out = model.forward(input);
    out.backward();
    af::sync();
    auto e = af::timer::stop(s);
    std::cout << std::setprecision(5) << e * 1000.0 << std::endl;
}

I was able to match the performance with torch7 CuDNN bindings if I make the input size always constant. However, if I pass random input between the sizes [lo, hi] the avg. performance if actually worse than sending input of size hi always. image

You can notice the spike at regular intervals which is increasing the avg. time taken.

Note that all the buffers, arrays are initialized using af::array(..) constructor (No CudaMallocs used) and all Cudnn Operation are placed on Arrayfire's Cuda Stream on the device.

I was wondering if the spikes at regular intervals strike something to you. Could the continuous memory allocations (of different sizes) be optimized with Arrayfire memory manager ?

Thanks in advance !

pavanky commented 6 years ago

@vineelpratap can you try doing the following ?

1) af::setMemStepSize(10 * (1 << 20)) before the benchmarking loop.

2) Try running out.eval()

Also do you want to share the changes you've made to arrayfire-ml ?

vineelpratap commented 6 years ago

@pavanky - Yes, af::setMemStepSize(10 * (1 << 20)) fixes the issue :)

Yes, we plan to share the code with all the changes to arrayfire-ml repo soon (< 1 month).