Closed AlexMRuch closed 4 years ago
Wow! I've been tearing my hair out on this for quite some time now. Couldn't figure out why the CUDA memory wasn't being cleared despite trying every trick in the book! Thank you for pointing out the cause, it definitely helps!
I've been using bash scripts to run the python scripts when I need to do multiple runs and/or hyperparameter tuning (I use W&B sweeps, thinking of adding a class to the library to support this natively as well). Maybe this will help you as well because you can avoid sacrificing the mixed-precision benefits if you use bash to initiate the python scripts.
So glad you think my comments are helpful! I've hit this issue with other PyTorch related libraries, like dgl
(Deep Graph Library), so I was beginning to think this was an issue specific to optuna
or mlflow
. Happy to have finally narrowed it down (I think). Had I not figured that out, my next step was definitely using bash scripts.
I may still consider using them, given the FP16 does increase speed a bit. Thanks for letting me know that worked for you!
How do you get the hyperparameter "study" to remember each "trial" suggestion of hyperparameters across runs when you use the bash approach? With optuna
, you do a setup like this:
study = optuna.create_study(
study_name = studyname,
direction = "minimize",
sampler = optuna.samplers.TPESampler(seed = random_seed)
) # no pruner because intermediate results not captured
study.optimize(
objective,
n_trials = args.max_runs,
n_jobs = 1,
callbacks = [mlflow_callback],
catch = (RuntimeError,) #pass CUDA Memory Overflow
)
With a bash approach, to clear out CUDA, I imagine you'd have to have a whole Python process shut down. Do you use subprocess
and use one Python runtime to call another and then send input / grab outputs from one process to the other?
Thanks again for this awesome library! Feel free to close this thread if you wish!
I set up my python script to accept command line arguments for the parameters that I want to "sweep'. Then I just write a bash script that sequentially calls the python script while necessary arguments. It's not exactly elegant though. 😅
I can give an example script later if you need.
Update: it looks like none of this is necessary with wandb sweeps. It clears out the GPU memory usage between each run.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Describe the bug
apex
seems to generate a GPU memory leak when using FP16 training/evaluation: https://github.com/NVIDIA/apex/issues/439. This is not a problem in most single-run cases; however, if you are using something likeoptuna
to do hyperparameter tuning, the memory leak will throw a CUDA Memory Overflow error in about 5-10 trials depending on the language model.To circumvent this issue, users can set
fp16
toFalse
and setfp16_opt_level
toO0
(FP32). This avoids the memory leak and allows users to do automated hyperparameter tuning without the CUDA Memory Overflow error.I suggest documentation be updated to note this bug: "Some users experience CUDA Memory Overflow errors when using automated hyperparameter tuning to run multiple versions of their model back-to-back. This can be avoided by setting
fp16=False
andfp16_opt_level=O0
to use FP32 instead."Hope this helps other
simpletransformer
users avoid this issue!To Reproduce
Expected behavior Automated hyperparameter tuning should run for the pre-set number of runs without throwing a CUDA Memory Error unless that CUDA Memory Error is due to the model/data being too large to fit on the GPU(s).
Desktop (please complete the following information):
Additional context This error was raised in an environment specific to
simpletransformers
.