dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
259 stars 52 forks source link

"cuda is not available, please confirm you have a cuda-support gpu" on a pc that is compatible "GPU is CUDA compatible" #2815

Open ponsaravanan2021 opened 6 months ago

ponsaravanan2021 commented 6 months ago

System Information (please complete the following information):

Describe the bug

To Reproduce Steps to reproduce the behavior: 1) Go to Model Builder 2) Choose Text classification under Natural Language Processing 3) Choose Local GPU in Select Training environment( Ensure GPU compatibility confirms "GPU is CUDA compatible") 4) Choose either SQL server/ Text file( either will lead to the same error) 5) Click train. You will see the error "cuda is not available, please confirm you have a cuda-support gpu" StackTrace at Microsoft.ML.ModelBuilder.AutoMLService.TextClassificationExperiment.d_13.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/TextClassificationExperiment.cs:line 66 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d_21.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 161

in ouptut window start text classification restore "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\COMMUNITY\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.gpu.csproj" --configfile "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\COMMUNITY\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\NuGet.config" -r win-x64 /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseIntermediateOutputPath="C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\obj" publish "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\COMMUNITY\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.gpu.csproj" -r win-x64 -c Release --no-self-contained -o C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3 --no-restore /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseOutputPath="C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\bin\" /p:BaseIntermediateOutputPath="C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\obj\" start installing runtime in C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3 Determining projects to restore... All projects are up-to-date for restore. MSBuild version 17.8.3+195e7f5a3 for .NET torchsharp.gpu -> C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\bin\Release\netstandard2.0\win-x64\torchsharp.gpu.dll torchsharp.gpu -> C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\ install runtime successfully

Expected behavior The training should complete with CUDA as it is compatible

Screenshots If applicable, add screenshots to help explain your problem. image image

Additional context Add any other context about the problem here. This problem is not isolated to model builder. I have the issue if i try this in code as well using TorchSharp;

torch.InitializeDeviceType(DeviceType.CUDA); Package references

<PackageReference Include="Microsoft.ML.TorchSharp" Version="0.21.0" />    
<PackageReference Include="PorterStemmer" Version="1.0.0" />    
<PackageReference Include="TorchSharp-cuda-windows" Version="0.101.5" />

Game Ready Driver This is the latest. There is no driver updates there after. image image

This is a old machine i am trying out GPU as my surface pro 7 takes ages to train using cpu

zewditu commented 5 months ago

@ponsaravanan2021 Could you update your Model Builder version to latest version: 17.17.0.2332602? From here https://marketplace.visualstudio.com/items?itemName=MLNET.ModelBuilder2022. If you still hit the issue, feel free to tell us.

LittleLittleCloud commented 5 months ago

@ponsaravanan2021 it seems that GT 520 supports until cuda 10, while to run GPU training on model builder, you would need cuda 11 or later.

In the meantime, your GPU also seems to be too small (1GB) to load text classification model and perform training, so it might be possible that even after you compile torch runtime according to your cuda version and link the library, the training would still fail.

ponsaravanan2021 commented 5 months ago

@ponsaravanan2021 Could you update your Model Builder version to latest version: 17.17.0.2332602? From here https://marketplace.visualstudio.com/items?itemName=MLNET.ModelBuilder2022. If you still hit the issue, feel free to tell us.

Hi @zewditu The model builder is already at 17.17.0.2332602. It may just be the messaging in the model builder needs a bit more evaluation before confirming that the GPU is compatible with the version of CUDA installed. I ran a simple CPP program to check compatibility, but it failed.

ponsaravanan2021 commented 5 months ago

@ponsaravanan2021 it seems that GT 520 supports until cuda 10, while to run GPU training on model builder, you would need cuda 11 or later.

In the meantime, your GPU also seems to be too small (1GB) to load text classification model and perform training, so it might be possible that even after you compile torch runtime according to your cuda version and link the library, the training would still fail.

Which is understandable, But looks like the messaging and the errors at later stage on compatibility is what worried me. If it fails due to lack of memory then isn't the messaging on the exception already wrong or misleading?

ponsaravanan2021 commented 5 months ago

Please feel free to close this issue as I am planning to invest on a new machine with a later graphics card to continue further.

Thank you.

ponsaravanan2021 commented 5 months ago

@ponsaravanan2021 Could you update your Model Builder version to latest version: 17.17.0.2332602? From here https://marketplace.visualstudio.com/items?itemName=MLNET.ModelBuilder2022. If you still hit the issue, feel free to tell us.

Hi @zewditu The model builder is already at 17.17.0.2332602. It may just be the messaging in the model builder needs a bit more evaluation before confirming that the GPU is compatible with the version of CUDA installed. I ran a simple CPP program to check compatibility, but it failed.

Sorry overlooked the version. I will retry and post here if this changes. I was checking the version in my surface. I will check in the old spare machine today and report here again.