espnet / espnet_model_zoo

ESPnet Model Zoo
Apache License 2.0
241 stars 39 forks source link

Add reazon-research/reazonspeech-espnet-v1 #70

Closed fujimotos closed 1 year ago

fujimotos commented 1 year ago

We've recently trained ESPnet model on 15,000-hour Japanese audio corpus harvested from Japanese TV programs.

Today we released the model on Hugging Face under Apache License v2.0:

We can confirm that this model archives the accuracy comparable with OpenAI Whisper Large-v2, so we believe this is a very good showcase to illustrate ESPnet2's capability.

We hope you find it interesting & look forward to your feedback.

Major Models and Accuracies Measured by CER

Model JSUT Basic5000 Common Voice
Whisper small 14.4% 15.2%
ESPnet LaboroTVSpeech 11.7% 12.6%
Whisper medium 9.9% 11.4%
Whisper large-v2 8.2% 9.7%
ESPnet ReazonSpeech 8.2% 9.9%
codecov-commenter commented 1 year ago

Codecov Report

Merging #70 (fd23a20) into master (9bd0f4b) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #70   +/-   ##
=======================================
  Coverage   50.76%   50.76%           
=======================================
  Files           2        2           
  Lines         390      390           
=======================================
  Hits          198      198           
  Misses        192      192           

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

sw005320 commented 1 year ago

Very exciting! Many thanks!

We'll figure out the CI error (torchaudio import in this case) later. So, wait for a while. Can you also make a recipe for this database in the main espnet? We're happy to help!

fujimotos commented 1 year ago

We'll figure out the CI error (torchaudio import in this case) later. So, wait for a while.

@sw005320 Thank you for a swift reply!

Can you also make a recipe for this database in the main espnet? We're happy to help!

Sure! We're right now working on it, and will submit a PR to espnet/espnet in a short time.

sw005320 commented 1 year ago

Very cool! We’re very excited about your work, and looking forward to PR!

sw005320 commented 1 year ago

@fujimotos, can you fix https://github.com/espnet/espnet_model_zoo/actions/runs/3955889355/jobs/6778506068#step:8:242 ?

fujimotos commented 1 year ago

@fujimotos, can you fix https://github.com/espnet/espnet_model_zoo/actions/runs/3955889355/jobs/6778506068#step:8:242 ?

@sw005320 Should be fixed by now!

Turned out that we needed to tweak our repo setting on Hugging Face to permit GitHub Actions to download the model. I confirmed that the CI runs green now:

https://github.com/fujimotos/espnet_model_zoo/actions/runs/3955905294/jobs/6791209291

sw005320 commented 1 year ago

Thanks a lot!

I saw several articles about this. Maybe, we can have a chat? I think we may further improve the performance with some effort. If you're interested in having a chat, please send me an email (shinjiw@ieee.org)

fujimotos commented 1 year ago

I saw several articles about this. Maybe, we can have a chat? ... If you're interested in having a chat, please send me an email

Sure! Our research head Daijiro Mori (@daijiro) should get in touch with you soon.