MyrtleSoftware / myrtlespeech

Speech recognition
Other
8 stars 1 forks source link

CommonVoice Dataset #6

Closed SamG97 closed 4 years ago

SamG97 commented 4 years ago

Added support for the CommonVoice dataset and training / evaluating with more than one dataset at a time.

julian edit: should we consider adding this with pytorch natively? Also, for combining two datasets I've just found the following torch class: https://pytorch.org/docs/master/data.html#torch.utils.data.ConcatDataset

SamG97 commented 4 years ago

Waiting on evaluation by running on GCP before merging

SamG97 commented 4 years ago

Experiments run: https://github.com/MyrtleSoftware/myrtlebench/blob/master/README.md Seems like CV is really hard to learn on with even the decently performing LS model getting a 100% WER on the TEST CV partition

samgd commented 4 years ago

@SamG97 now master is updated can this be rebased?

samgd commented 4 years ago

Lots of effort and thinking have gone into the branch but I suggest we close it without merging now that torchaudio includes built-in support for LibriSpeech and CommonVoice. Create a new PR that upgrades our repository for the new PyTorch and torchaudio?

samgd commented 4 years ago

@julianmack ^

julianmack commented 4 years ago

Lots of effort and thinking have gone into the branch but I suggest we close it without merging now that torchaudio includes built-in support for LibriSpeech and CommonVoice. Create a new PR that upgrades our repository for the new PyTorch and torchaudio?

Yes I agree. Closing this PR and I'll open a WIP PR for using torchaudio instead.