common-voice / commonvoice-fr

Tooling for producing French dataset for Common Voice
100 stars 24 forks source link

Use 🐸 STT 1.1.0 #156

Closed wasertech closed 2 years ago

wasertech commented 3 years ago

Switching to 🐸 STT

Sometimes to go forward, you need to take two steps back.

Like it or not, 🐸 STT is here to stay, where as DeepSpeech...

...well, you know.

What does it implies?

For the end user:

For the maintainers:

It currently works on the current main branch of my fork of STT as I’m still in the process of requesting the changes we need/deserve for French into 🐸’s repo.

Building the image

docker build \
--rm \
--build-arg uid=1018 \
--build-arg gid=1018 \
-f Dockerfile.train \
-t commonvoice-fr . && \
docker run \
-it \
--gpus=all \
--privileged \
--shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--mount type=bind,src=/home/waser/Projets/Données/DeepSpeech/data,dst=/mnt commonvoice-fr && \
docker container prune || docker container prune -f

For now I'll update this gist with the latest build logs as fast as my 2920X can spit them.

You can also find the produced alphabet here.

My config FYI

OS: Manjaro Linux x86_64 
Kernel: 5.13.19-2-MANJARO 
Shell: zsh 5.8 
CPU: AMD Ryzen Threadripper 2920X (24) @ 3.500GH 
GPU 1: NVIDIA TITAN RTX 
GPU 2: NVIDIA TITAN RTX
Memory: 96422MiB 
wasertech commented 2 years ago

Docker image finally builds!

Logs are in the gist

Next: update and migrate scripts to the STT ones.

wasertech commented 2 years ago

Sometimes to go forward, you need to take two steps back.