This is a proposed modification to kur that distributes the work of pre-processing audio data in the speech recognition supplier. Pre-processing audio data is a computationally expensive task (due to the ffts that are calculated). When running with multiple GPUs and large batch size, the current configuration (which does all pre-processing on one thread) is insufficient and batch times are compromised as a result.
The proposed solution simply offloads the computationally expensive function call (get_audio_features) in a worker function that is submitted to a ProcessPoolExecutor from Python's native concurrent.futures module. The default behavior is to use a pool size of n - 1, where n is the number of CPUs on the system as determined by multiprocessing.cpu_count(). For a variety of reasons, this ansatz may not be optimal, so users have the option of manually specifying the number of processes that should be used for data pre-processing. This is done with the "data_cpus" option for the speech recognition supplier.
Testing shows that speedups of at least 2x are possible with this approach.
…rec.py
This is a proposed modification to kur that distributes the work of pre-processing audio data in the speech recognition supplier. Pre-processing audio data is a computationally expensive task (due to the ffts that are calculated). When running with multiple GPUs and large batch size, the current configuration (which does all pre-processing on one thread) is insufficient and batch times are compromised as a result.
The proposed solution simply offloads the computationally expensive function call (get_audio_features) in a worker function that is submitted to a ProcessPoolExecutor from Python's native concurrent.futures module. The default behavior is to use a pool size of n - 1, where n is the number of CPUs on the system as determined by multiprocessing.cpu_count(). For a variety of reasons, this ansatz may not be optimal, so users have the option of manually specifying the number of processes that should be used for data pre-processing. This is done with the "data_cpus" option for the speech recognition supplier.
Testing shows that speedups of at least 2x are possible with this approach.