HJ-harry / score-MRI

Apache License 2.0
139 stars 21 forks source link

Possibility to run on CPU #1

Closed zaccharieramzi closed 2 years ago

zaccharieramzi commented 2 years ago

Hi,

I am very interested in your work, and I would like to reproduce its results as well as try to extend some of it.

I am just starting by running the retrospective inference command for real-valued data:

python inference_real.py --task 'retrospective' \
                    --data '001' \
                    --mask_type 'gaussian1d' \
                    --acc_factor 4 \
                    --center_fraction 0.08 \
                    --N 2000

I am currently running into the following error when trying to run on CPU:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "/home/zaccharie/workspace/score-MRI/inference_real.py", line 7, in <module>
    from models import ncsnpp
  File "/home/zaccharie/workspace/score-MRI/models/ncsnpp.py", line 18, in <module>
    from . import utils, layers, layerspp, normalization
  File "/home/zaccharie/workspace/score-MRI/models/layerspp.py", line 20, in <module>
    from . import up_or_down_sampling
  File "/home/zaccharie/workspace/score-MRI/models/up_or_down_sampling.py", line 10, in <module>
    from op import upfirdn2d
  File "/home/zaccharie/workspace/score-MRI/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/zaccharie/workspace/score-MRI/op/fused_act.py", line 11, in <module>
    fused = load(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1144, in load
    return _jit_compile(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1456, in _write_ninja_file_and_build_library
    _write_ninja_file_to_build_library(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1857, in _write_ninja_file_to_build_library
    cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1626, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
IndexError: list index out of range

I tried to fix it by adding with_cuda=torch.cuda.is_available(), to fused = load(..., but got the following error:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/zaccharie/workspace/score-MRI/inference_real.py", line 7, in <module>
    from models import ncsnpp
  File "/home/zaccharie/workspace/score-MRI/models/ncsnpp.py", line 18, in <module>
    from . import utils, layers, layerspp, normalization
  File "/home/zaccharie/workspace/score-MRI/models/layerspp.py", line 20, in <module>
    from . import up_or_down_sampling
  File "/home/zaccharie/workspace/score-MRI/models/up_or_down_sampling.py", line 10, in <module>
    from op import upfirdn2d
  File "/home/zaccharie/workspace/score-MRI/op/__init__.py", line 1, in <module>
    from .fused_act import FusedLeakyReLU, fused_leaky_relu
  File "/home/zaccharie/workspace/score-MRI/op/fused_act.py", line 11, in <module>
    fused = load(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1144, in load
    return _jit_compile(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused': [1/2] c++ -MMD -MF fused_bias_act_kernel.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/include -isystem /home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/include/TH -isystem /home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/include/THC -isystem /home/zaccharie/workspace/score-MRI/venv/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/zaccharie/workspace/score-MRI/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.o 
c++: warning: /home/zaccharie/workspace/score-MRI/op/fused_bias_act_kernel.cu: linker input file unused because linking not done
[2/2] c++ fused_bias_act.o fused_bias_act_kernel.o -shared -L/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o fused.so
FAILED: fused.so 
c++ fused_bias_act.o fused_bias_act_kernel.o -shared -L/home/zaccharie/workspace/score-MRI/venv/lib/python3.9/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o fused.so
c++: error: fused_bias_act_kernel.o: Aucun fichier ou dossier de ce type
ninja: build stopped: subcommand failed.

One extra thing I did before running these commands was to also install Ninja (not listed in the reqs).

zaccharieramzi commented 2 years ago

Actually, I just needed to add a condition if torch.cuda.is_available(): before loading any cuda source. I will submit a PR if interested.

HJ-harry commented 2 years ago

Hi @zaccharieramzi , thanks for the interest in our work and also thank you for the useful PR! I hadn't really thought about running the experiments on CPU since diffusion models are already slow on GPUs. Still, I believe it would be useful for researchers who currently only have access to CPUs.

zaccharieramzi commented 2 years ago

In my case it's not necessarily about not having a GPU, it's more that I want to be able to write unit tests for what I am doing and potentially having these unit tests run in a CI.

Re the PR feel free to merge it (I don't have the access to do so).

If you are interested I have also a branch where I reformatted the code in order to easily perform said unit tests.

HJ-harry commented 2 years ago

Thanks, I've merged the PR now.

Would you mind submitting a PR about the unit tests? That would be valuable.