Description
I've successfully trained several models using wav files, but when trying to train similar setups (same n_hidden, batch-size, etc) using vorbis .ogg, my memory got filled pretty fast. After some investigations (see below), I think there might be a memory leak on loading ogg vorbis (and maybe on opus as well).
I've tried to free some structures, with no success.
Build the docker for training (docker build -f Dockerfile.train . -t stt-train:latest)
Run the docker: docker run -it stt-train:latest (I've used the following options as well: --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864)
Run one of the following bash (they just train using a csv with 100k lines of the sample audio):
./bin/run-ci-ldc93s1-wav_100k.sh
./bin/run-ci-ldc93s1-vorbis_100k.sh
Expected behavior
I don't know how to run a profiler on the training, but I've printed the docker memory-usage around the same iteration number comparing the training with wav and vorbis:
For an actual training setup (using more than 400k samples, the majority of them with longer duration than the sample), my 32gb is filled in no time.
Environment (please complete the following information):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Mint 20.2 Cinnamon (based on Ubuntu 20.04)
TensorFlow installed from (our builds, or upstream TensorFlow): Installed through your Dockerfile
TensorFlow version (use command below): TensorFlow Version 1.15.5
Python version: Python 3.8.10
CUDA/cuDNN version: on pc 11.7; on docker I received the warning WARNING: CUDA Minor Version Compatibility mode ENABLED. Using driver version 470.141.03 which has support for CUDA 11.4. This container was built with CUDA 11.6 and will be run in Minor Version Compatibility mode.
GPU model and memory: Nvidia Titan XP (12gb) ; RAM: 32gb
Additional context
I noticed that the leaked memory is freed at the end of the epoch, the following print shows the start of the 2nd epoch using way less memory:
Description I've successfully trained several models using wav files, but when trying to train similar setups (same
n_hidden
,batch-size
, etc) using vorbis .ogg, my memory got filled pretty fast. After some investigations (see below), I think there might be a memory leak on loading ogg vorbis (and maybe on opus as well).I've tried to free some structures, with no success.
To Reproduce I've created a fork and a branch for testing such memory leak: https://github.com/bernardohenz/STT/tree/ogg_memory_leak Steps to reproduce the behavior:
docker build -f Dockerfile.train . -t stt-train:latest
)docker run -it stt-train:latest
(I've used the following options as well:--gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864
)./bin/run-ci-ldc93s1-wav_100k.sh
./bin/run-ci-ldc93s1-vorbis_100k.sh
Expected behavior I don't know how to run a profiler on the training, but I've printed the docker memory-usage around the same iteration number comparing the training with
wav
andvorbis
:For an actual training setup (using more than 400k samples, the majority of them with longer duration than the sample), my 32gb is filled in no time.
Environment (please complete the following information):
WARNING: CUDA Minor Version Compatibility mode ENABLED. Using driver version 470.141.03 which has support for CUDA 11.4. This container was built with CUDA 11.6 and will be run in Minor Version Compatibility mode.
Additional context I noticed that the leaked memory is freed at the end of the epoch, the following print shows the start of the 2nd epoch using way less memory: