IBM / pytorch-large-model-support

Large Model Support in PyTorch
Apache License 2.0
132 stars 19 forks source link

Building pytorch LMS on cuda 11 cards #10

Open Sazoji opened 3 years ago

Sazoji commented 3 years ago

Is there a modified version of pytorch 1.5.X that's compatible with CUDA >=11.0 for RTX30 series cards? I can run small images very quickly on vanilla but LMS is essential due to the ram constraints on every 30 series card but the 3090 If anyone can direct me to a solution it would be VERY helpful.

Problem: patches were built for PyTorch 1.5.0 before the 30 series and compatible CUDA drivers where integrated -> unable to build pytorch LMS for graphics card nor use the prebuilt powerai repo

Possible solutions: find a compatible CUDA 10 driver that detects the 30series and builds pytorch 1.5.0 LMS is integrated into pytorch proper alternate solutions for memory management in training LMS is updated to pytorch 1.7.0 pytorch 1.5.0 is patched to support CUDA 11 toolkit

jayfurmanek commented 3 years ago

I agree with your assessment, but unfortunately don't have a solution. We no longer have the resources to maintain LMS for newer PyTorch versions that would bring the cuda11 support you need and have been unsuccessful in getting it merged upstream.