Closed JinnnK closed 3 months ago
Hello, I have also encountered such a problem recently, do you have a solution?
If you have a solution, can you share it?
@Samsara011
Hello, stage 2 requires at least 96GB memory. So you must upgrade your memory.
@JinnnK Thanks for your reply! But why does the paper say that it can be trained in 16G video memory, is it because it uses four cards to train, but this does not meet the description in the paper? Looking forward to your reply again!
@Samsara011 I mean 96GB of RAM, not VRAM. For VRAM, it requires less than 18GB.
@JinnnK Oh, I see. Thank you for your reply!
Hello,
Thank you for your continuous contributions to this excellent research.
I am writing to report an issue while training stage 2. I consistently face the following error after completing the first epoch. Despite various attempts to resolve it, including tuning the
samples_per_gpu
andworkers_per_gpu
in the configuration and resetting thedata_root
, there has been no progress.Initially, I suspected a VRAM shortage; however, the process only consumes about 15GB on a single GPU, so I believe the issue lies elsewhere.
Here are the specifications of my system:
Below is the error log:
I would appreciate any insights or suggestions you might have to help resolve this issue.
Thank you.