Stereo audio - Githubissues

Hi,

I think it is doable, even with our pretrained model.

These are where we select the first channel, you need to change these.

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/dataloader.py#L112

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/dataloader.py#L116

You also need to work on fbank extraction to make sure the output is two channel.

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/dataloader.py#L126

This includes a new dim which were squeezed for single-channel fbanks. So you also need to take care of the input pre-processing at the model side

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L436

Note we did this for multiple forward pass and above is just one of them.

Then you need to change the model size to take two channels instead of one.

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L130

In short, it needs some (careful) changes of the code, but is doable. I am not sure about your purpose, but it will be easier if you can add the two channels as a single channel.

-Yuan

YuanGongND / ssast

Stereo audio #18