fgnt / sms_wsj

SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
MIT License
110 stars 25 forks source link

fftconvolve and baseline ASR #4

Closed hangtingchen closed 4 years ago

hangtingchen commented 4 years ago

Thanks a lot for the well-prepared dataset. I have just created the dataset on my own machines. https://github.com/fgnt/sms_wsj/blob/620708a97d1ce81dd9fffc2c7916fccb5230baf5/sms_wsj/database/utils.py#L172 One problem I met listed above was that fftconvolve did not support axis option and either did I find on the scipy website https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.signal.fftconvolve.html. I just removed the option and everything went fine.

Another problem is how to evaluate the dataset, for example where to get the baseline asr script based on kaldi.

jensheit commented 4 years ago

Thanks for your interest in SMS-WSJ.

The fftconvolve problem should be fixed by updating your scipy version to at least scipy 1.2.0: https://docs.scipy.org/doc/scipy-1.2.0/reference/generated/scipy.signal.fftconvolve.html

To evaluate the database, we expect the first stage of the kaldi wsj baseline to be run: https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/run.sh

Afterwards your can run our asr baseline skript with the following command: python -m sms_wsj.train_baseline_asr with egs_path=$KALDI_ROOT/egs/ json_path=/path/to/sms_wsj.json where the path to the sms_wsj.json was defined during the creation of the database and is by default cache/sms_wsj

hangtingchen commented 4 years ago

Really thank you