coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.27k stars 275 forks source link

Bug: Memory leak when training with vorbis #2287

Closed bernardohenz closed 1 year ago

bernardohenz commented 2 years ago

Description I've successfully trained several models using wav files, but when trying to train similar setups (same n_hidden, batch-size, etc) using vorbis .ogg, my memory got filled pretty fast. After some investigations (see below), I think there might be a memory leak on loading ogg vorbis (and maybe on opus as well).

I've tried to free some structures, with no success.

To Reproduce I've created a fork and a branch for testing such memory leak: https://github.com/bernardohenz/STT/tree/ogg_memory_leak Steps to reproduce the behavior:

  1. Clone the branch (https://github.com/bernardohenz/STT/tree/ogg_memory_leak)
  2. Build the docker for training (docker build -f Dockerfile.train . -t stt-train:latest)
  3. Run the docker: docker run -it stt-train:latest (I've used the following options as well: --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864)
  4. Run one of the following bash (they just train using a csv with 100k lines of the sample audio):
    • ./bin/run-ci-ldc93s1-wav_100k.sh
    • ./bin/run-ci-ldc93s1-vorbis_100k.sh

Expected behavior I don't know how to run a profiler on the training, but I've printed the docker memory-usage around the same iteration number comparing the training with wav and vorbis: wav_100k vorbis_100l

For an actual training setup (using more than 400k samples, the majority of them with longer duration than the sample), my 32gb is filled in no time.

Environment (please complete the following information):

Additional context I noticed that the leaked memory is freed at the end of the epoch, the following print shows the start of the 2nd epoch using way less memory: vorbis_100k_new_epoch

wasertech commented 2 years ago

STT doesn't officially support Vorbis OGG format. Convert the audio either to WAV or Opus; mono-channel 16kHz format.