BeamFormer Multithreading Refactoring

ahochheiden commented 7 years ago

https://github.com/TataLab/iCubAudioAttention/blob/987add069501dd7d5d3fdff3712dd36568890b4a/src/audioPreprocessing/src/beamFormer.cc#L102-L138

Since Fra raised some performance concerns, I think it might be worthwhile to refactor this multithreading code.

We should incorporate some kind of thread pool so that the threads are only created once and are ready before we need them. We could set the number of threads with a config value and determine the optimal amount through some testing. (I don't think we need more than one thread per logical core on the machine)

The change to evenly distribute the work among the threads would then be simple. void BeamFormer::audioMultiThreadingLoop(int i) would still take the same parameter i, and the i passed to it would be the thread number and we just increment the i by the number of threads and loop while i is less than the total beams.

Something like this:

    while(i < totalBeams)
    {
        for (int j = 0; j < nBands; j++) {
            for (int k = 0; k < frameSamples; k++) {
                beamFormedAudioVector[i][j][k] = (inputSignal[j][k] + inputSignal[j + nBands][myMod(k + ((getNBeamsPerHemifield) - i), frameSamples)]);
            }
        }

        i += NUM_THREADS;
    }

I think this only matters if we don't go down the WebRTC path for the nonlinear beamformer that Matt suggested, but I think it was worth noting nonetheless.

milievski commented 7 years ago

I think that this is a real great idea @ahochheiden. If you want to open a new branch and implement it please feel free to do so. We can merge the branch into master once you are finished implementing the code.

milievski commented 7 years ago

Another improvement that could be looked into is the incorporating GPU via the use of CUDA. @reafrancesco do you think that incorporating CUDA into our system is worth it or should we avoid that for now?

cognitiveinteraction commented 6 years ago

It is definitely worth and I am actually using it for my visual attention. If you check in /rtobotology/attention you might have an insight on how this is done. Also @gonzalezJohnas (my phd student) has been working on this.

gonzalezJohnas commented 6 years ago

@milievski In order to speed up the process there are two ways that can be envisaged :

CPU multi threading, I usually use OpenMP (http://www.openmp.org/) which provide macro to automatically deport the charge on the CPU cores with built in optimization.
CUDA kernel, which can either written from scratch or use already implemented library such cuBLAS (http://docs.nvidia.com/cuda/cublas/index.html).

With the CUDA implementation keep in mind that copying the data from CPU to GPU and backward can take time so if a lot of exchanges are needed the CPU implementation could be more efficient.

TataLab / iCubAudioAttention

BeamFormer Multithreading Refactoring #4