After considering multiple libraries, Librosa is the most commonly used in Python and has most of the important features. Librosa is also reusable. Hence, we can integrate Librosa into MLPro for loading and processing audio data, rather than reinventing the wheel
There are 3 main components of audio data: amplitude, time, and frequency
We can load the audio data through librosa.load(....) from .wav format, which returns samples and sampling rate. For mp3 format, it can be done using a converter (e.g. from pydub import AudioSegment)
librosa has several types of visualizations (librosa.display.[plot type]), which can be incorporated into our MLPro-Streams visualization
Fourier transform -> converts a continuous signal from time-domain (x-axis = time, y-axis = amplitude) to frequency-domain (x-axis = frequency, y-axis = magnitude)
In mode SIM audio data shall be streamed from a defined mp3 file. In mode REAL data shall be imported from a microphone...
Cross references https://wiki.python.org/moin/Audio https://librosa.org/ https://towardsdatascience.com/understanding-audio-data-fourier-transform-fft-spectrogram-and-speech-recognition-a4072d228520 https://www.youtube.com/watch?v=ZqpSb5p1xQo&t=200s