[STT] downloading of deepspeech2offline_librispeech is too slow

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

https://paddlespeech.readthedocs.io

Apache License 2.0

10.57k stars 1.81k forks source link

[STT] downloading of deepspeech2offline_librispeech is too slow #2513

Closed trappedinspacetime closed 1 year ago

trappedinspacetime commented 1 year ago

I am on Ubuntu 20.04 platform. I live in Türkiye. I installed PaddleSpeech via pip.

When I want to test offline model with:

    paddlespeech asr --model deepspeech2offline_librispeech --lang en --input ./en.wav -v

it starts to download but it's extremely slow around 300KB. Is it possible for me to download the speech model manually and extract it to PaddleSpeech's directory (I don't know where)? Best regards.

yt605155624 commented 1 year ago

You can find the pretrained models' url in https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/paddlespeech/resource/pretrained_models.py

The model will be downloaded to ${HOME}/.paddlespeech/models, as for deepspeech2offline_librispeech, here is the file tree:

By the way, you must keep the *.tar.gz file for md5 checking to skip redownload.

trappedinspacetime commented 1 year ago

@yt605155624 Thank you for responding and the info you gave in detail. I downloaded /asr0_deepspeech2_offline_librispeech_ckpt_1.0.1.model.tar.gz file and extracted it to the directory as you instructed. But as my surprize, it needs another file, around 9GB, common_crawl_00.prune01111.trie.klm Assuming that those files will be loaded into RAM, it's too much for my 8GB personal computer.

Thank you for all your help. I'm grateful to you. All the best.