Inefficient RAM usage? - Githubissues

Kyubyong / tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Apache License 2.0

1.83k stars 436 forks source link

Inefficient RAM usage? #37

Open chief7 opened 7 years ago

chief7 commented 7 years ago

Hey there! First of all: thanks for the amazing work!

I've hit the problem that my train.py process gets killed by the Linux kernels OOM killer. My question is: Has someone experienced the same? I guess there's some kind of inefficient RAM usage (I suspect the normalization computations, though I'm just starting to look into it).

I used a different data set but the rest of the code is exactly the same.

Thanks in advance.

EDIT: Forgot to mention that this happens though I'm running on gpu!

chief7 commented 7 years ago

Update on this one: I did a little research and it seems that the audio features in data_load.py are calculated over and over again.

As a first hack I had a script calculating all the wav's features and saving them to some files using pickle. This brings down CPU load from around 600%-700% on my 8-core down to around 150%.

Anyway - this doesn't fix RAM consumption.

zuoxiang95 commented 7 years ago

@chief7 I used a different data set too, and i find the low GPU usage. Do you solve this problem?

chief7 commented 7 years ago

@zuoxiang95 As my dataset is very small (~12 GB of features) I managed to put it all to RAM and that improved training speed as well as GPU usage

EDIT: I wrote a small script to do some preprocessing on the files and save all features to numpy files. That seems to do the trick as well.

gdahlm commented 7 years ago

Related to GPU memory usage, if you have a GPU with lots of ram it is useful to add the following to train.py if you want to be able to run eval.py without stopping the run from time to time (although it will be slow)

I have 11GB on a 1080ti and so 60% seems to work well for me and my batch rate stuck around ~4000/h.

I will try to commit back some code if I produce anything useful but I am benchmarking some tensorflow bazel options right now.

115a116,117
>         config.gpu_options.allow_growth = True
>         config.gpu_options.per_process_gpu_memory_fraction = 0.6
118c120
<         with sv.managed_session() as sess:
---
>         with sv.managed_session(config=config) as sess: