Open ponsaravanan2021 opened 6 months ago
@ponsaravanan2021 Could you update your Model Builder version to latest version: 17.17.0.2332602? From here https://marketplace.visualstudio.com/items?itemName=MLNET.ModelBuilder2022. If you still hit the issue, feel free to tell us.
@ponsaravanan2021 it seems that GT 520 supports until cuda 10, while to run GPU training on model builder, you would need cuda 11 or later.
In the meantime, your GPU also seems to be too small (1GB) to load text classification model and perform training, so it might be possible that even after you compile torch runtime according to your cuda version and link the library, the training would still fail.
@ponsaravanan2021 Could you update your Model Builder version to latest version: 17.17.0.2332602? From here https://marketplace.visualstudio.com/items?itemName=MLNET.ModelBuilder2022. If you still hit the issue, feel free to tell us.
Hi @zewditu The model builder is already at 17.17.0.2332602. It may just be the messaging in the model builder needs a bit more evaluation before confirming that the GPU is compatible with the version of CUDA installed. I ran a simple CPP program to check compatibility, but it failed.
@ponsaravanan2021 it seems that GT 520 supports until cuda 10, while to run GPU training on model builder, you would need cuda 11 or later.
In the meantime, your GPU also seems to be too small (1GB) to load text classification model and perform training, so it might be possible that even after you compile torch runtime according to your cuda version and link the library, the training would still fail.
Which is understandable, But looks like the messaging and the errors at later stage on compatibility is what worried me. If it fails due to lack of memory then isn't the messaging on the exception already wrong or misleading?
Please feel free to close this issue as I am planning to invest on a new machine with a later graphics card to continue further.
Thank you.
@ponsaravanan2021 Could you update your Model Builder version to latest version: 17.17.0.2332602? From here https://marketplace.visualstudio.com/items?itemName=MLNET.ModelBuilder2022. If you still hit the issue, feel free to tell us.
Hi @zewditu The model builder is already at 17.17.0.2332602. It may just be the messaging in the model builder needs a bit more evaluation before confirming that the GPU is compatible with the version of CUDA installed. I ran a simple CPP program to check compatibility, but it failed.
Sorry overlooked the version. I will retry and post here if this changes. I was checking the version in my surface. I will check in the old spare machine today and report here again.
System Information (please complete the following information):
Describe the bug
To Reproduce Steps to reproduce the behavior: 1) Go to Model Builder 2) Choose Text classification under Natural Language Processing 3) Choose Local GPU in Select Training environment( Ensure GPU compatibility confirms "GPU is CUDA compatible") 4) Choose either SQL server/ Text file( either will lead to the same error) 5) Click train. You will see the error "cuda is not available, please confirm you have a cuda-support gpu" StackTrace at Microsoft.ML.ModelBuilder.AutoMLService.TextClassificationExperiment.d_13.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/TextClassificationExperiment.cs:line 66
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d_21.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 161
in ouptut window start text classification restore "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\COMMUNITY\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.gpu.csproj" --configfile "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\COMMUNITY\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\NuGet.config" -r win-x64 /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseIntermediateOutputPath="C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\obj" publish "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\COMMUNITY\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.gpu.csproj" -r win-x64 -c Release --no-self-contained -o C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3 --no-restore /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseOutputPath="C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\bin\" /p:BaseIntermediateOutputPath="C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\obj\" start installing runtime in C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3 Determining projects to restore... All projects are up-to-date for restore. MSBuild version 17.8.3+195e7f5a3 for .NET torchsharp.gpu -> C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\bin\Release\netstandard2.0\win-x64\torchsharp.gpu.dll torchsharp.gpu -> C:\Users\nitya\AppData\Local\Temp\ModelBuilder\torchsharp-cuda-0.98.3\ install runtime successfully
Expected behavior The training should complete with CUDA as it is compatible
Screenshots If applicable, add screenshots to help explain your problem.
![image](https://github.com/dotnet/machinelearning-modelbuilder/assets/94661751/70872ea8-9bb4-4347-a7ae-657bd0095e87)
Additional context Add any other context about the problem here. This problem is not isolated to model builder. I have the issue if i try this in code as well using TorchSharp;
torch.InitializeDeviceType(DeviceType.CUDA); Package references
Game Ready Driver This is the latest. There is no driver updates there after.
![image](https://github.com/dotnet/machinelearning-modelbuilder/assets/94661751/7a35dc8c-9279-4796-a925-113c43f5654d)
This is a old machine i am trying out GPU as my surface pro 7 takes ages to train using cpu