Closed ahochheiden closed 5 years ago
I think that this is a real great idea @ahochheiden. If you want to open a new branch and implement it please feel free to do so. We can merge the branch into master once you are finished implementing the code.
Another improvement that could be looked into is the incorporating GPU via the use of CUDA. @reafrancesco do you think that incorporating CUDA into our system is worth it or should we avoid that for now?
It is definitely worth and I am actually using it for my visual attention. If you check in /rtobotology/attention you might have an insight on how this is done. Also @gonzalezJohnas (my phd student) has been working on this.
@milievski In order to speed up the process there are two ways that can be envisaged :
With the CUDA implementation keep in mind that copying the data from CPU to GPU and backward can take time so if a lot of exchanges are needed the CPU implementation could be more efficient.
https://github.com/TataLab/iCubAudioAttention/blob/987add069501dd7d5d3fdff3712dd36568890b4a/src/audioPreprocessing/src/beamFormer.cc#L102-L138
Since Fra raised some performance concerns, I think it might be worthwhile to refactor this multithreading code.
We should incorporate some kind of thread pool so that the threads are only created once and are ready before we need them. We could set the number of threads with a config value and determine the optimal amount through some testing. (I don't think we need more than one thread per logical core on the machine)
The change to evenly distribute the work among the threads would then be simple. void BeamFormer::audioMultiThreadingLoop(int i) would still take the same parameter i, and the i passed to it would be the thread number and we just increment the i by the number of threads and loop while i is less than the total beams.
Something like this:
I think this only matters if we don't go down the WebRTC path for the nonlinear beamformer that Matt suggested, but I think it was worth noting nonetheless.