Closed lucasgautheron closed 3 years ago
good point -- also, please note that we do not yet know how to integrate annotations (made by humans or machines), which could emerge from one of the channels or a combination of the channels. (for instance, perhaps humans annotate best when they hear binaurally the "front" + one other channel)
We are already facing this problem with Marvin's pilot. More discussion with the BabyLogger team is needed, but I have proposed the following solution for Marvin's classification task : for each sampled 30s window, we retain the channel with the highest energy. If the mics are directional, this might be a good way to maxime the signal/noise ratio, because we expect the highest energy to be achieved by the channel directed towards the speaker. Of course there might be better combinations, but this is the least arbitrary imo. However, in case of conversations, it might extinguish one of the speakers. What do you think ?
This will be needed in order to handle the BabyLogger audio We need to decide how to make it work.
Here are the functionalities that are affected:
One way would be to have one profile for each channel, and add the channel as an option of the ConversionPipeline.