Why didn't added GoogleRandomSubsetWERFilter class in process.py pipeline?
first up all thanks for Given this Project as Open-Source. Awesome work. thank you so much KTSpeechCrawler team.:)
i was tried KTSpeechCrawler project to collecting youtube audio datasets for ASR Speech-to-text task.
i was collected and finished entire steps. after that i was tested transcipt with corresponding audio files (.wav, .txt).
here i getting 11/100 audios are mistakes.
if we will apply google_speech_test , and validate to remove less than the threshold means (threshold=0.85) we can get good proper audiofiles and transcipt.
can you please tell where i need to start and add this module to do google_speech_test?
Here any complexity will come, for using google_speech_test?
why GOOGLE_TEST default OK?
Why didn't added GoogleRandomSubsetWERFilter class in process.py pipeline?
first up all thanks for Given this Project as Open-Source. Awesome work. thank you so much KTSpeechCrawler team.:)
i was tried KTSpeechCrawler project to collecting youtube audio datasets for ASR Speech-to-text task.
i was collected and finished entire steps. after that i was tested transcipt with corresponding audio files (.wav, .txt).
here i getting 11/100 audios are mistakes.
if we will apply google_speech_test , and validate to remove less than the threshold means (threshold=0.85) we can get good proper audiofiles and transcipt.
can you please tell where i need to start and add this module to do google_speech_test?
Here any complexity will come, for using google_speech_test?
that pipeline module,
here which place i need add that module? last is enough?
Thank you sir :)