bowang-lab / U-Mamba

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
https://arxiv.org/abs/2401.04722
Apache License 2.0
692 stars 64 forks source link

How to solve the problem of experiments stalling? #42

Open YUjh0729 opened 5 months ago

YUjh0729 commented 5 months ago

Hello,

When I train the model, the experiment stops at a certain epoch and doesn't continue training. The GPU usage is at 1% and the memory usage is 12GB, indicating that the experiment is still running. However, it stays stuck at the current epoch for an entire night, preventing the experiment from progressing. What could be the problem? Can you help explain this? 屏幕截图 2024-05-30 093003

Thank you.

zcyrique commented 3 months ago

Hi @YUjh0729 , I'm having the same issue as you! Were you able to solve it? Any help would be greatly appreciated. @JunMa11, any help on this one? Thank you!

YUjh0729 commented 2 months ago

Hi @zcyrique , I've tried all the solutions from the issues, but none of them resolved the issue.

zcyrique commented 2 months ago

Thank you @YUjh0729 for reaching out, Let wait for anyone who may have solved this issue for help.