Closed wolverineq closed 6 years ago
Yup, the intention was to mimic the kaldi setup using the core-test set as the "test" set. The training and dev set might be split in a different way but should contain the same number of speakers.
I have followed your repo for a lone time and appreciate your work very much, but I still wonder why not split timit data in the same way that others do? I tried splitting them in a more common way (462-speaker from timit/train as train set, 50-speaker from timit/test as dev set. and 24-speaker from timit/test as the core test set). The results showed 19.9% WER for ctc_config.json in examples/timit, which is not a very good performance. I think if you provide some good results with a more common data splitting method in baseline system, it would be very helpful for other people who are interested in this repo and do some extention works... (thats why I see this problem very important) Thank you again for your work here!
Hi, I checked the Kaldi setup and you are correct. The training set there uses only 462 speakers. I'm using the same test set though. I've updated the code to generate the same training set. The dev set is the same size in terms of speakers and utterances, but may not contain the exact same speakers. The test set is identical.
I will need to rerun experiments to have an apples-to-apples comparison.
Thanks for your attention on this! However, it may be better if you use exactly the same train/dev/test set as others do inconvenience of comparision...
The training set is exactly the same, the dev set is a bit different. I'll look into making it the same, but this is lower priority at the moment.
On Mar 6, 2018 18:20, "wolverineq" notifications@github.com wrote:
Thanks for your attention on this! However, it may be better if you use exactly the same train/dev/test set as others do inconvenience of comparision...
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/awni/speech/issues/22#issuecomment-370999755, or mute the thread https://github.com/notifications/unsubscribe-auth/ABeKlVRULf1FUNJk961kWjYM2mlrliv7ks5tb0QJgaJpZM4SU70H .
I have followed your repo for a lone time and appreciate your work very much, but I still wonder why not split timit data in the same way that others do? I tried splitting them in a more common way (462-speaker from timit/train as train set, 50-speaker from timit/test as dev set. and 24-speaker from timit/test as the core test set). The results showed 19.9% WER for ctc_config.json in examples/timit, which is not a very good performance. I think if you provide some good results with a more common data splitting method in baseline system, it would be very helpful for other people who are interested in this repo and do some extention works... (thats why I see this problem very important) Thank you again for your work here!
I've been looking for "the right way" to split TIMIT myself. Is it common practice to use 50 speakers from the original test-set as dev-set?.
Normally we use the standard 462-speaker data as training set, while this timit exmaple use 556-speaker data(including some data from the full test set) in train.json. Although the WER results seem pretty promising in this repo, are the methods you use here really convincing or comparable?