CASP-Systems-BU / GCNSplit

Streaming Graph Partitioning
Apache License 2.0
4 stars 1 forks source link

Repository Setup: Docker Build Failing #1

Open fidsusj opened 1 year ago

fidsusj commented 1 year ago

Dear GCNSplit maintainers,

I was trying to set up the repository as shown in the README.md. When running docker build -t local-torch-geometric . I received the following error:

> [ 4/27] RUN apt-get update && apt-get install -y --no-install-recommends cuda-cudart-10-1=10.1.243-1 cuda-compat-10-1 && ln -s cuda-10.1 /usr/local/cuda && rm -rf /var/lib/apt/lists/*:                                
NVIDIA/nvidia-docker#7 0.287 Get:1 http://ports.ubuntu.com/ubuntu-ports bionic InRelease [242 kB]                                                                                                                                                                      
NVIDIA/nvidia-docker#7 0.320 Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease [1581 B]                                                                                                                                      
NVIDIA/nvidia-docker#7 0.388 Err:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease                                                                                                                                               
...
NVIDIA/nvidia-docker#7 3.611 Reading package lists...
NVIDIA/nvidia-docker#7 4.173 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
NVIDIA/nvidia-docker#7 4.173 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is not signed.

There are fixes as proposed in https://github.com/NVIDIA/nvidia-container-toolkit/issues/258, which leads to new errors in the following steps regarding locating the cuda-cudart-10-1 and cuda-compat-10-1 package.

Could you check the Dockerfiles needed for installation again (with clearing the docker build cache beforehand with docker builder prune) and update them to be up-to-date? Or could you provide a pre-built docker image in Docker Hub when the build works?

Thanks a lot and kind regards!

soniahorchidan commented 1 year ago

Hello! Thank you for raising this issue and for your interest in the project! I just updated the Dockerfile, and the installation should hopefully be fixed now. Could you please try again and let us know if it worked?

fidsusj commented 1 year ago

Thanks a lot for including the fix! Unfortunately, this leads to the following consecutive error as expected:

 > [ 9/32] RUN apt-get update && apt-get install -y --no-install-recommends         cuda-cudart-10-1=10.1.243-1         cuda-compat-10-1 &&     ln -s cuda-10.1 /usr/local/cuda &&     rm -rf /var/lib/apt/lists/*:                             
NVIDIA/nvidia-docker#12 0.557 Hit:1 http://ports.ubuntu.com/ubuntu-ports bionic InRelease           
NVIDIA/nvidia-docker#12 0.654 Get:2 http://ports.ubuntu.com/ubuntu-ports bionic-updates InRelease [88.7 kB]
NVIDIA/nvidia-docker#12 0.995 Get:3 http://ports.ubuntu.com/ubuntu-ports bionic-backports InRelease [74.6 kB]
NVIDIA/nvidia-docker#12 1.191 Get:4 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease [88.7 kB]
NVIDIA/nvidia-docker#12 1.370 Get:5 http://ports.ubuntu.com/ubuntu-ports bionic-updates/universe arm64 Packages [2064 kB]
NVIDIA/nvidia-docker#12 2.375 Get:6 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 Packages [2018 kB]
NVIDIA/nvidia-docker#12 3.015 Get:7 http://ports.ubuntu.com/ubuntu-ports bionic-security/main arm64 Packages [1639 kB]
NVIDIA/nvidia-docker#12 3.478 Get:8 http://ports.ubuntu.com/ubuntu-ports bionic-security/universe arm64 Packages [1367 kB]
NVIDIA/nvidia-docker#12 10.28 Err:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
NVIDIA/nvidia-docker#12 10.28   Temporary failure resolving 'developer.download.nvidia.com'
NVIDIA/nvidia-docker#12 20.28 Err:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
NVIDIA/nvidia-docker#12 20.28   Temporary failure resolving 'developer.download.nvidia.com'
NVIDIA/nvidia-docker#12 20.30 Fetched 7339 kB in 20s (367 kB/s)
NVIDIA/nvidia-docker#12 20.30 Reading package lists...
NVIDIA/nvidia-docker#12 20.87 W: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/InRelease  Temporary failure resolving 'developer.download.nvidia.com'
NVIDIA/nvidia-docker#12 20.87 W: Failed to fetch https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/InRelease  Temporary failure resolving 'developer.download.nvidia.com'
NVIDIA/nvidia-docker#12 20.87 W: Some index files failed to download. They have been ignored, or old ones used instead.
NVIDIA/nvidia-docker#12 20.89 Reading package lists...
NVIDIA/nvidia-docker#12 21.44 Building dependency tree...
NVIDIA/nvidia-docker#12 21.53 Reading state information...
NVIDIA/nvidia-docker#12 21.55 E: Unable to locate package cuda-cudart-10-1
NVIDIA/nvidia-docker#12 21.55 E: Unable to locate package cuda-compat-10-1
------
executor failed running [/bin/sh -c apt-get update && apt-get install -y --no-install-recommends         cuda-cudart-$CUDA_PKG_VERSION         cuda-compat-10-1 &&     ln -s cuda-10.1 /usr/local/cuda &&     rm -rf /var/lib/apt/lists/*]: exit code: 100

I can query https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/InRelease manually, but https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/InRelease returns 404 nevertheless. This should also not be just a temporary failure, since I tried this over several weeks now.

I'm not a CUDA expert and do not find any relatable issues online, do you have an idea what the problem might be?

soniahorchidan commented 1 year ago

Hello!

I am sorry the fix did not work. I will be looking into this problem again soon. In the meantime, please install the dependencies manually and run the code outside the Docker container. I will get back once the problem is solved.

Best regards, Sonia