Closed thunguyenth closed 4 months ago
Hi @thunguyenth. To terminate the process in Linux, you can use the command pkill -9 python
.
Thank you, @YTEP-ZHI, for your reply ^^
I would appreciate it if someone could explain why the GPU memory is not released after the training is stopped forcibly.
Hi, thank you for sharing your work ^^ I have a problem that:
The GPU memory is not released if I forcibly stop the training process (by using Ctrl-C in the terminal)
Config:
Actions:
Step 1. Training the stage 2 on the nuScene dataset _v1.0-mini_ version
=> The training process is working normally
Step 2. Stop the training after a few iterations of the 1st epoch by using Ctrl-C in the terminal
Step 3. Re-run the training in step 1 => Out of memory ERROR!!!
I checked the GPU state by nvidia-smi as shown in the below screenshot (when the training was already stopped a few minutes ago), the GPU memory used in the training process at Step 1 was not released (17323MiB / 24259MiB).
This issue can be easily solved by releasing the GPU memory manually, but I wonder if this issue happens to everyone or if it just happens in my case? (since I couldn't find a similar issue reported in this repo) and Why is the GPU memory not released even though the training has stopped? I would appreciate it if you could help me clarify this.
Best regards.