Closed zefyrr closed 3 years ago
post the exact error? (e.g: screenshot or copy & paste)
Plz find attached config.json.txt out.log
I have trained on an RTX 2060 super and an RTX Titan. Depending on your utterance length, 11GB may not always be enough. For LJspeech with FP16, 8GB seems to be enough. I'm on pytorch 1.6 with cuda 10.2 and 11.1 installed.
Also, you are training against libritts and validating against ljspeech?
the error
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
doesn't match anything I've seen from a VRAM OOM.
My guess is that the CUDA / CUDNN / Pytorch install is invalid.
(I've trained using RTX 2080 Ti and GTX 1080 Ti on this repo and had no problems, I train on Linux and do inference on Windows and Linux)
@zefyrr what pytorch version and are you using AMP?
Thnx, for all the comments. Updated to using docker image nvcr.io/nvidia/pytorch:20.07-py3, which has pytorch 1.6 and cuda 11.
Here is the latest config_libritts.json.txt train.log.txt
Am now seeing:
Traceback (most recent call last):
File "train.py", line 300, in
Resampled libritts using this command: sox input.wav -r 22050 output.wav
@rafaelvalle : Am not using AMP, just using one gpu. That should be ok right?
Was able to resolve issue 👍, this was helpful: https://github.com/NVIDIA/flowtron/issues/9#issuecomment-629628804
Wanted to check if there is much experience in the community with training on the libritts ds with a 2080Ti - 11GB DDR. Was able to downsample the libritts ds to 22050, and then hit a wall with cuda device side assert error.
Any recommendations would be greatly appreciated! Thnx.