Closed galv closed 3 years ago
Please read commit 276d902 for explanation.
Also, please don't canibalize create_asr_features.py. By doing that, it makes it impossible for us to rerun librispeech feature extraction in the future. In cases like these where we can't really unit-test everything, it's better to just copy-and-modify, which I did in my commit (I copied your changes to a new file called create_peoples_speech_asr_features.py). Can you please revert the changes you made to create_asr_features.py before merging?
BTW, I estimated 86.5 hours to featurize the training set on a single machine with a single job. But it wasn't running at full capacity. I suspect that you booted up a 16 core machine overnight and did ~20 separate shards, this would run to completion relatively quickly.
Closing this in favor of #11
LEt's just make this a PR for now for making it easy to compare your changes.