Open giswqs opened 11 months ago
I am not sure how we could resolve. In general, the recommendation is to be explicit if one wants cuda. Tagging @hmaarrfk who may have additional insights on the latest tips and tricks
I guess because autogluon
doesn't depend on cuda itself that it by default picks the CPU variant. I'm not sure if there's a way to prefer the GPU variant without explicitly specifying it in some way. Maybe there is - I'm not at all familiar with this aspect of mamba
This command can install the pytorch cuda verson properly:
mamba install -c conda-forge pytorch
Since autogluon depends on pytorch, I would expect this command to install the pytorch cuda version properly, but it doesn't.
mamba install -c conda-forge autogluon
This command can install the pytorch cuda version properly, but I just could not understand why. Since autogluon depends on pytorch, install autogluon pytorch
and install autogluon
should make no difference, but it is not the case here.
mamba install -c conda-forge autogluon pytorch
I'm not sure myself, maybe @wolfv might know whether this is expected behaviour with mamba
This is likely a corner case of how __cuda
arch spec was design; @wolfv is definitely the one to know all the details (about what is potentially tripping up the solver here) 👀
@giswqs for completeness, and if you don't mind, could you test the behavior with conda and micromamba? Or I can test it if I manage to get an allocation before you manage to have your copies of micromamba/conda
@ngam Thanks for the suggestion. I will give it a try tomorrow. It is midnight here now. Off to bed shortly.
Okay, I checked on a cluster with gpus. I believe the problem here is that cuda120 takes high precedence over cuda118, thus xgboost in your env will get cuda120, but that conflicts with the pytorch cuda 120, so it gives you out the cpu version. All of this is due to something in autogluon (deps-wise). Here's a readout:
micromamba create -n test_ag_2_mic2 autogluon pytorch=*=*cuda120*
conda-forge/linux-64 Using cache
conda-forge/noarch Using cache
error libmamba Could not solve for environment specs
The following packages are incompatible
├─ autogluon is installable with the potential options
│ ├─ autogluon 0.6.2 would require
│ │ └─ autogluon.timeseries 0.6.2 , which requires
│ │ └─ pytorch <1.13,>=1.9 , which can be installed;
│ ├─ autogluon [0.7.0|0.8.0|0.8.1] would require
│ │ └─ autogluon.timeseries [0.7.0 |0.8.0 |0.8.1 ], which requires
│ │ └─ pytorch >=1.9,<1.14 , which can be installed;
│ ├─ autogluon 0.8.2 would require
│ │ ├─ autogluon.multimodal 0.8.2 with the potential options
│ │ │ ├─ autogluon.multimodal 0.8.2 would require
│ │ │ │ └─ pytorch >=1.9,<1.14 , which can be installed;
│ │ │ └─ autogluon.multimodal 0.8.2 would require
│ │ │ ├─ pytorch >=2.0,<2.1 with the potential options
│ │ │ │ ├─ pytorch 2.0.0 conflicts with any installable versions previously reported;
│ │ │ │ ├─ pytorch 2.0.0, which can be installed;
│ │ │ │ ├─ pytorch 2.0.0, which can be installed;
│ │ │ │ └─ pytorch 2.0.0, which can be installed;
│ │ │ └─ torchvision >=0.15.0,<0.16.0 with the potential options
│ │ │ ├─ torchvision [0.15.1|0.15.2] would require
│ │ │ │ └─ pytorch * cpu*, which can be installed;
│ │ │ ├─ torchvision [0.15.1|0.15.2] would require
│ │ │ │ └─ pytorch [2.0 cuda112*|2.0.* cuda112*], which can be installed;
│ │ │ └─ torchvision 0.15.2 would require
│ │ │ └─ pytorch 2.0 cpu*, which can be installed;
│ │ ├─ autogluon.tabular 0.8.2 with the potential options
│ │ │ ├─ autogluon.tabular 0.8.2 would require
│ │ │ │ └─ pytorch >=1.9,<1.14 , which can be installed;
│ │ │ └─ autogluon.tabular 0.8.2 would require
│ │ │ └─ pytorch >=1.13,<2.1 with the potential options
│ │ │ ├─ pytorch 2.0.0 conflicts with any installable versions previously reported;
│ │ │ ├─ pytorch [1.13.0|1.13.1], which can be installed;
│ │ │ ├─ pytorch 2.0.0, which can be installed;
│ │ │ ├─ pytorch 2.0.0, which can be installed;
│ │ │ └─ pytorch 2.0.0, which can be installed;
│ │ └─ autogluon.timeseries 0.8.2 with the potential options
│ │ ├─ autogluon.timeseries [0.7.0|0.8.0|0.8.1|0.8.2], which can be installed (as previously explained);
│ │ └─ autogluon.timeseries 0.8.2 would require
│ │ └─ pytorch >=1.13,<2.1 with the potential options
│ │ ├─ pytorch 2.0.0 conflicts with any installable versions previously reported;
│ │ ├─ pytorch [1.13.0|1.13.1], which can be installed;
│ │ ├─ pytorch 2.0.0, which can be installed;
│ │ ├─ pytorch 2.0.0, which can be installed;
│ │ └─ pytorch 2.0.0, which can be installed;
│ └─ autogluon 1.0.0 would require
│ ├─ autogluon.multimodal 1.0.0 , which requires
│ │ └─ torchvision >=0.15.0,<0.16.0 , which can be installed (as previously explained);
│ └─ autogluon.timeseries 1.0.0 , which requires
│ └─ pytorch >=2.0,<2.1 with the potential options
│ ├─ pytorch 2.0.0 conflicts with any installable versions previously reported;
│ ├─ pytorch 2.0.0, which can be installed;
│ ├─ pytorch 2.0.0, which can be installed;
│ └─ pytorch 2.0.0, which can be installed;
└─ pytorch * *cuda120* is not installable because it conflicts with any installable versions previously reported.
critical libmamba Could not solve for environment specs
Hope this makes sense. I am not super familiar with the solver, but that's my interpretation. When you specify "pytorch" for the solver (e.g., in the call), it then takes precedence over xgboost and others, and so you get the cuda118 versions of all (because that's the highest available option for your env with pytorch taking highest higher precendee over things like xgboost). Note that the cuda120 versions of pytorch and xgboost don't conflict (i.e., micromamba create -n test xgboost pytorch
yields cuda120 versions of both in harmony)
@ngam's analysis looks solid to me. CUDA migrations should be becoming smoother in the future (much-improved setup as of CUDA 12), but for now, the best way to solve this would be to figure out which dependencies aren't built for CUDA 12 yet, and help those get built.
@ngam Thank you very much for looking into it. Your explanation makes a lot of sense.
It appears that the autogluon.multimodal dependency restriction causes this issue.
torchvision >=0.15.0,<0.16.0
Only torchvision v0.16.1 supports cuda 120. If autogluon.multimodal can increase the torchvision version upper bound to 0.16.1, mamba install -c conda-forge autogluon
should be able to pull in cuda120.
Comment:
@h-vetinari @PertuyF @dhirschfeld @ngam @arturdaraujo Thank you all for your help with the autoguon conda-forge packages earlier this year. We recently ran into a strange issue with the autogluon conda-forge installation. @suzhoum and I have spent a few day debugging the issue but still could not figure it out yet. The issue is that if we specifically add
pytorch
in themamba
installation command, it can install the pytorch cuda version properly. Withoutpytorch
in the installation command, it will only install the cpu version. We would great appreciate your advice on this issue.Create a conda env
This installs the pytorch cpu version
This installs the pytorch cuda version