Closed vnm-neurodesk closed 2 months ago
@vennand - could you test this container and see if it all works as expected?
How do I transfer data to the neurodesktop to test? It opens as expected, but I need to launch a job to know if it'll work.
Are you running neurodesktop locally in docker? If yes, you have a shared directory between the desktop and the host.
Alternatively you can drag and drop files on the desktop and guacamole will upload the file (has to be one file, can't be a directory)
I'm trying locally in docker, and I just noticed the directory, thanks!
Is it possible to do a GPU passthrough with the local docker? I'm pretty sure I won't be able to test if the GPU settings work otherwise. Though so far, there was no error message saying it was CPU only.
Though I'm not convinced it compiled with GPU support if the machine that built the container didn't have a GPU. With the new version of relion (ver5.0), they explicitly state that the compiler tries to detect a GPU, and if not, compiles for CPU only, even if the a GPU architecture is provided.
Dear @vennand
yes, you can pass your GPU into the docker container:
sudo docker run \
--shm-size=1gb -it --privileged --user=root --name neurodesktop \
-v ~/neurodesktop-storage:/neurodesktop-storage \
-e NB_UID="$(id -u)" -e NB_GID="$(id -g)" \
--gpus all \
-p 8888:8888 -e NEURODESKTOP_VERSION=2024-01-12 \
vnmd/neurodesktop:2024-01-12
to check if it worked, run nvidia-smi in the desktop container afterwards
that would be annoying if it needs a GPU to compile. We do not have the ability to run a GPU node for building containers.
We might just be limited to version 4 for now then. As far as I can tell, version 5 is still in beta, so it might not be advisable to use it for research anyway.
I tried running your command, but I get the following error message
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. ERRO[0000] error waiting for container: context canceled
Didn't find anything relevant with a very quick Google. Any idea what could cause this?
Also, I won't be able to touch this until the 16th of April unfortunately, but I plan on getting back to it.
Dear @vennand
Did you install the nvidia-container-toolkit beforehand?
#RHEL/CentOS (yum-based)
sudo yum install nvidia-container-toolkit -y
#Ubuntu/Debian (apt-based)
sudo apt install nvidia-container-toolkit -y
I had not, but I get the same error after installing it
what are you getting when you run nvidia-smi on your host system?
`+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P40 Off | 00000000:01:00.0 Off | Off | | N/A 18C P8 9W / 250W | 4MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1536 G /usr/lib/xorg/Xorg 4MiB | +---------------------------------------------------------------------------------------+`
can you try this? https://www.howtogeek.com/devops/how-to-use-an-nvidia-gpu-with-docker-containers/
It needs a restart restart of the docker daemon and potentially apt-get install -y nvidia-docker2
I've installed nvidia-docker2, but I've also ran this
sudo nvidia-ctk runtime configure --runtime=docker
I don't know which one worked, but it worked. I'll try to test it now, but I don't know if I'll have time
Hi @stebo85,
I've finished testing. Relion works as intended, but none of the jobs showed up when running "nvidia-smi", even though we could see the GPU being used. Not sure if that's an issue with the GPU passthrough, but it is using the GPU.
Another important issue is that one of the third party software I install along with relion doesn't work. Basically, CTFFIND 4.1.14 fails if it's compiled with GCC 8 or above. The fix I've found is to modify the code, which doesn't seem practical or elegant to do in the neurodesk script. What would be the best approach around this? Should I host a "fixed" copy of the code on my own Github? (though I'm not sure if the license agreement allows this)
Dear @vennand,
which command did you use for testing the GPUs? I have seen a similar behaviour once using the old flag. Can you try with --gpus all
? Another check: what comes up when you run which nvidia-smi
?
Fixing a software live for a container is a tricky one. I have done various things in the past depending on the project: 1) apply an sed command that fixes a few single lines in the neurocontainer buildscript - would that work for you? 2) provide a fixed sourecode file in the neurocontainers repository along with the build script and copy it into the container during build to overwrite the upstream file 3) fork the software and then fix it there and use the fix inside the container + provide the fix upstream in the hope they merge it.
@stebo85
To test the GPU, I simply watched nvidia-smi (watch -n 1 nvidia-smi)
while running relion. Relion launches python scripts that show there normally. They didn't in the VM, but they were listed on the main machine (the one I'm running neurodesk from).
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 Off | 00000000:01:00.0 Off | Off |
| N/A 31C P0 74W / 250W | 24256MiB / 24576MiB | 67% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1632 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 7835 C ...relion-4.0.1.sm61/bin/relion_refine 24250MiB |
+---------------------------------------------------------------------------------------+
I don't know exactly how the code accesses the GPU, but I can probably find out if that's relevant.
When I run which nvidia-smi
I get /usr/bin/nvidia-smi
Regarding fixing the software, I think I'll go with option 2, since the source code is only 11MB. Do you want me to push the fix now, or should we investigate the GPU "issue" before?
Interesting. I don't know what causes this behaviour, but I guess if it works it works no matter where the GPU tasks show up.
Happy for you to push the fix now :) Let's see if we can get this work!
@stebo85 Would you know what this error means?
$ bash build.sh -ds
Entering Debug mode
WARNING: Skipping neurodocker as it is not installed.
Defaulting to user installation because normal site-packages is not writeable
Collecting https://github.com/ReproNim/neurodocker/tarball/master
Downloading https://github.com/ReproNim/neurodocker/tarball/master
- 77.3 kB 10.0 MB/s 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [36 lines of output]
/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning:
ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x
Your build configuration is incomplete and previously worked by accident!
setuptools_scm requires setuptools>=61
Suggested workaround if applicable:
- migrating from the deprecated setup_requires mechanism to pep517/518
and using a pyproject.toml to declare build dependencies
which are reliably pre-installed before running the build tools
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
main()
File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/build.py", line 112, in prepare_metadata_for_build_wheel
directory = os.path.join(metadata_directory, f'{builder.artifact_project_id}.dist-info')
File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/builders/wheel.py", line 825, in artifact_project_id
self.project_id
File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/builders/plugin/interface.py", line 374, in project_id
self.__project_id = f'{self.normalize_file_name_component(self.metadata.core.name)}-{self.metadata.version}'
File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 149, in version
self._version = self._get_version()
File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 248, in _get_version
version = self.hatch.version.cached
File "/tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/hatchling/metadata/core.py", line 1466, in cached
raise type(e)(message) from None
LookupError: Error getting the version from source `vcs`: setuptools-scm was unable to detect version for /tmp/pip-req-build-wu94yd8o.
Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Yes, you need to update the GitHub url of neurodocker:
For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj [end of output]
Thank you
Steffen
--
W: https://mri.sbollmann.nethttps://mri.sbollmann.net/ | W: https://www.neurodesk.orghttps://www.neurodesk.org/ | T: https://twitter.com/sbollmann_MRI | G: https://github.com/stebo85
Book meeting: https://calendly.com/s-bollmann/meeting
From: Andre Venne @.> Sent: Monday, May 27, 2024 9:15:30 AM To: NeuroDesk/neurocontainers @.> Cc: Steffen Bollmann @.>; Mention @.> Subject: Re: [NeuroDesk/neurocontainers] New container relion 4.0.1.sm61 (Issue #620)
@stebo85https://github.com/stebo85 Would you know what this error means?
$ bash build.sh -ds Entering Debug mode WARNING: Skipping neurodocker as it is not installed. Defaulting to user installation because normal site-packages is not writeable Collecting https://github.com/ReproNim/neurodocker/tarball/master Downloading https://github.com/ReproNim/neurodocker/tarball/master
77.3 kB 10.0 MB/s 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [36 lines of output] /tmp/pip-build-env-aeraeba5/overlay/local/lib/python3.10/dist-packages/setuptools_scm/_integration/setuptools.py:31: RuntimeWarning: ERROR: setuptools==59.6.0 is used in combination with setuptools_scm>=8.x
Your build configuration is incomplete and previously worked by accident! setuptools_scm requires setuptools>=61
Suggested workaround if applicable:
migrating from the deprecated setup_requires mechanism to pep517/518 and using a pyproject.toml to declare build dependencies which are reliably pre-installed before running the build tools
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in vcs
: setuptools-scm was unable to detect version for /tmp/pip-req-build-wu94yd8o.
Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed
× Encountered error while generating package metadata. ╰─> See above for output.
note: This is an issue with the package mentioned above, not pip. hint: See above for details.
— Reply to this email directly, view it on GitHubhttps://github.com/NeuroDesk/neurocontainers/issues/620#issuecomment-2132799224, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA6V2W2Z6DSY4RYJNEGIPO3ZELMRFAVCNFSM6AAAAABE3BMHXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSG44TSMRSGQ. You are receiving this because you were mentioned.Message ID: @.***>
@stebo85
Hey, I'm back working on this. I'll start implementing the other software soon.
But first, I tested this version of relion on our other GPUs, and it runs without issues. Perhaps the default setting (sm35) is too old, but this one works. I'm thinking it would be simpler for users to only package this one. If you think this could be a good idea, how do we go about this? Only put this one in the JSON, with Exec: relion?
Great to hear that Relion is working :)
ok, makes sense that the newer version works better. CUDA is usually quite backwards compatible, so if you have fairly current driver versions that makes sense.
Yes, put the version you found working best in the apps.json and this will trigger the release process.
Thank you for getting this to work !!!
There is a new container by @stebo85, use this command to test:
If test was successful, then add to apps.json to release: https://github.com/NeuroDesk/neurocommand/edit/main/neurodesk/apps.json
Please close this issue when completed :)