Closed tranmanhdat closed 3 years ago
Memory does not release when training model in buff/cache of memory increase when start train and lead to OOM
Follow tutorials, train on custom data( approximate 500 hours~ 150gb
Ubuntu 18.04.5 LTS Intel(R) Xeon(R) CPU E5-2690 v4@ 2.60GHz 2 GPUs Tesla V100 64 GB RAM
Run with docker image cuda lastest, architect same tutorial my flagsfile memory when run training my Gpus train process
What is your longest audio?
approximate 24s, i find out after i release cached then restart docker, bug had gone, maybe docker container didn't release all cache!
Bug Description
Memory does not release when training model in buff/cache of memory increase when start train and lead to OOM
Reproduction Steps
Follow tutorials, train on custom data( approximate 500 hours~ 150gb
Platform and Hardware
Ubuntu 18.04.5 LTS Intel(R) Xeon(R) CPU E5-2690 v4@ 2.60GHz 2 GPUs Tesla V100 64 GB RAM
Additional Context
Run with docker image cuda lastest, architect same tutorial my flagsfile memory when run training my Gpus train process