felixbur / nkululeko

Machine learning speaker characteristics
MIT License
26 stars 4 forks source link

Add resampling functionality #58

Closed felixbur closed 10 months ago

felixbur commented 10 months ago

Most models require 16 khz sampling rate, but data might be in other rates, so it'd be nice to automatically resample data

bagustris commented 10 months ago

For wav2vec2 and Hubert, it is already resampled on-the-fly if not in 16k,

https://github.com/felixbur/nkululeko/blob/64a949a37728726dddbc0bc25e4b351ea82df6ec/nkululeko/feat_extract/feats_wav2vec2.py#L51-L55

For others, I provided a Python script to convert to 16k in the emofilm data directory: convert_to_16k.py. It uses sox as backend, and maybe only works on Unix only.

If needed, I propose to use torchaudio.transforms.resample to avoid the need of new requirements.

felixbur commented 10 months ago

ok, i needed that for the mos and snr models and can add it there. The disadvantage of course is that a database not being in 16 kHz will be resampled than over and over again, potentially 4 times in one run. So i wonder if we should implement a "resample" module, that would affect the train and test splits of the project, (not the whole databases)

felixbur commented 10 months ago

done with 0.62.0