Closed njzjz closed 1 month ago
The is_oom_error
function in auto_batch_size.py
has been updated to handle an additional out-of-memory error related to CUSOLVER_STATUS_INTERNAL_ERROR
. If this error or the standard "CUDA out of memory" message is detected, the function now releases cached memory using torch.cuda.empty_cache()
before returning True
.
File | Change Summary |
---|---|
deepmd/pt/utils/auto_batch_size.py |
Updated is_oom_error function to handle "CUSOLVER_STATUS_INTERNAL_ERROR" and release cached memory before returning True . |
sequenceDiagram
participant User
participant System
participant GPU
participant torch
User->>System: Run computation
System->>GPU: Execute task
GPU-->>System: Return error (e.g., CUDA out of memory, CUSOLVER_STATUS_INTERNAL_ERROR)
System->>System: Check if error is OOM
alt Error is OOM
System->>torch: torch.cuda.empty_cache()
torch-->>System: Cache cleared
System->>User: Return True
else Error is not OOM
System->>User: Return False
end
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Attention: Patch coverage is 0%
with 4 lines
in your changes are missing coverage. Please review.
Project coverage is 82.52%. Comparing base (
12bcc50
) to head (d3fa08c
).
Files | Patch % | Lines |
---|---|---|
deepmd/pt/utils/auto_batch_size.py | 0.00% | 4 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Summary by CodeRabbit