Yaziwel / Restore-RWKV

Restore-RWKV: Efficient and Effective Medical Image Restoration with RWKV
52 stars 4 forks source link

可以多卡训练吗? #9

Open zdyshine opened 2 months ago

zdyshine commented 2 months ago

感谢您的开源,我尝试运行代码,单卡可以训练,但是多卡会有如下问题: Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. 请问您那边可以多卡训练吗?

Yaziwel commented 1 month ago

感谢您的开源,我尝试运行代码,单卡可以训练,但是多卡会有如下问题: Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module wkv... Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/wkv/build.ninja... Building extension module wkv... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. 请问您那边可以多卡训练吗?

可以多卡训练,你试试把/root/.cache/torch_extensions/py310_cu118/wkv这个缓存删掉