BiomedSciAI / fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
Apache License 2.0
137 stars 34 forks source link

Fix the installation protocol so it will be compatible with A100 GPU and CCC's CUDA version #261

Closed egozi closed 1 year ago

egozi commented 1 year ago

In interactive mode at CCC Im getting some issue with CUDA - RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I also get this warning (maybe it is connected): /dccstor/mm_hcls/egozi/anaconda3/envs/fuse/lib/python3.10/site-packages/torch/cuda/init.py:146: UserWarning: NVIDIA A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

egozi commented 1 year ago

fix it by running: conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia Aftetr removing torch and torchvision

mosheraboh commented 1 year ago

@SagiPolaczek, after you try to fix the issue with torchvision version. Let's decouple the torch installation - let's specify this step (how to install pytorch) in README file. And make sure that fuse doesn't reisntall it,