dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Text Classification Scenario - This application or script uses TorchSharp but doesn't contain a reference to libtorch-cuda-11.3-win-x64, Version=1.11.0.1. #2427

Open luisquintanilla opened 1 year ago

luisquintanilla commented 1 year ago

System Information (please complete the following information):

Describe the bug

This application or script uses TorchSharp but doesn't contain a reference to libtorch-cuda-11.3-win-x64, Version=1.11.0.1.

To Reproduce Steps to reproduce the behavior:

Train a model using the Text Classification scenario with GPU.

Expected behavior

Model trains successfully.

Screenshots

image

Additional context

Log: RestaurantSentiment-PA5G67.txt

luisquintanilla commented 1 year ago

Other error running similar workflow.

image

TCLibraryModel-DGG2FT.txt

luisquintanilla commented 1 year ago

Datasets being used that repro error:

Column To Predict: Risk Category Text Column: Violation Description

RestaurantScores.zip

luisquintanilla commented 1 year ago

Training with this dataset works.

yelp_labelled.txt

luisquintanilla commented 1 year ago

Workaround: For the Restaurant Scores dataset, the Text Column (Violation Description) has a lot of missing values. Removing the rows with missing values allows you to train successfully.

Suggestion - Look into how TextClassificationTrainer deals with missing values.

luisquintanilla commented 1 year ago

This might be a Framework issue. Training using the generated code from Model Builder using a dataset that has empty values produces the following error:

Error

{"CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`\nException raised from gemm at C:\\actions-runner\\_work\\pytorch\\pytorch\\builder\\windows\\pytorch\\aten\\src\\ATen\\cuda\\CUDABlas.cpp:374 (most recent call first):\n00007FF89E60A4C200007FF89E60A460 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]\n00007FF89E609D8E00007FF89E609D40 c10.dll!c10::detail::torchCheckFail [<unknown file> @ <unknown line number>]\n00007FFF94A6D61400007FFF94A2BA30 torch_cuda_cu.dll!at::cuda::zero_ [<unknown file> @ <unknown line number>]\n00007FFF94AB894A00007FFF94A9F450 torch_cuda_cu.dll!at::native::legacy_lstsq_out_cuda [<unknown file> @ <unknown line number>]\n00007FFF94AB99F900007FFF94A9F450 torch_cuda_cu.dll!at::native::legacy_lstsq_out_cuda [<unknown file> @ <unknown line number>]\n00007FFF94ABACFA00007FFF94ABACA0 torch_cuda_cu.dll!at::native::structured_mm_out_cuda::impl [<unknown file> @ <unknown line number>]\n00007FFF94A0C89C00007FFF949C5B90 torch_cuda_cu.dll!at::cuda::view_as_real [<unknown file> @ <unknown line number>]\n00007FFF949773CF00007FFF9493A730 torch_cuda_cu.dll!at::cuda::bucketize_outf [<unknown file> @ <unknown line number>]\n00007FF832C8C44C00007FF832C07140 torch_cpu.dll!at::TensorMaker::make_tensor [<unknown file> @ <unknown line number>]\n00007FF832FC390800007FF832FC3890 torch_cpu.dll!at::_ops::mm::redispatch [<unknown file> @ <unknown line number>]\n00007FF833D3979700007FF833B1A050 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]\n00007FF833D04B1A00007FF833B1A050 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]\n00007FF832F7F28400007FF832F7F150 torch_cpu.dll!at::_ops::mm::call [<unknown file> @ <unknown line number>]\n00007FF834839AD900007FF834815880 torch_cpu.dll!torch::jit::getUpgraderBytecodeList [<unknown file> @ <unknown line number>]\n00007FF833A859BC00007FF833A856D0 torch_cpu.dll!torch::autograd::generated::AddmmBackward0::apply [<unknown file> @ <unknown line number>]\n00007FF833A7BBC800007FF833A7B8B0 torch_cpu.dll!torch::autograd::Node::operator() [<unknown file> @ <unknown line number>]\n00007FF8341C1CEA00007FF8341C1640 torch_cpu.dll!torch::autograd::Engine::add_thread_pool_task [<unknown file> @ <unknown line number>]\n00007FF8341C26FB00007FF8341C2340 torch_cpu.dll!torch::autograd::Engine::evaluate_function [<unknown file> @ <unknown line number>]\n00007FF8341C70C300007FF8341C6950 torch_cpu.dll!torch::autograd::Engine::thread_main [<unknown file> @ <unknown line number>]\n00007FF8341C68B600007FF8341C67F0 torch_cpu.dll!torch::autograd::Engine::thread_init [<unknown file> @ <unknown line number>]\n00007FF8341BD6D500007FF8341BCC40 torch_cpu.dll!torch::autograd::Engine::get_base_engine [<unknown file> @ <unknown line number>]\n00007FF92978936300007FF9297892C0 ucrtbase.dll!recalloc [<unknown file> @ <unknown line number>]\n00007FF92A6126BD00007FF92A6126A0 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]\n00007FF92BD4DFB800007FF92BD4DF90 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]\n"}

Code

Training

var ctx = new MLContext();

ctx.FallbackToCpu = false;
ctx.GpuDeviceId = 1;

var data = ctx.Data.LoadFromTextFile<TCLibModel.ModelInput>(@"C:\Datasets\RestaurantScores.csv", separatorChar: ',', hasHeader: true);

Console.WriteLine("Splitting data...");
var dataSplit = ctx.Data.TrainTestSplit(data, 0.2);

Console.WriteLine("Creating pipeline...");
var pipeline = TCLibModel.BuildPipeline(ctx);

Console.WriteLine("Training model...");
var model = pipeline.Fit(dataSplit.TrainSet);

Console.WriteLine("Evaluating model...");
var predictions = model.Transform(dataSplit.TestSet);
var evaluationMetrics = ctx.MulticlassClassification.Evaluate(predictions,labelColumnName: "RiskCategory");

Console.WriteLine($"Macro-Accuracy: {evaluationMetrics.MacroAccuracy}");
Console.WriteLine($"Micro-Accuracy: {evaluationMetrics.MicroAccuracy}");

Schema

public class ModelInput
{
    [LoadColumn(0)]
    [ColumnName(@"InspectionType")]
    public string InspectionType { get; set; }

    [LoadColumn(1)]
    [ColumnName(@"ViolationDescription")]
    public string ViolationDescription { get; set; }

    [LoadColumn(2)]
    [ColumnName(@"RiskCategory")]
    public string RiskCategory { get; set; }

}

Pipeline

var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName:@"RiskCategory",inputColumnName:@"RiskCategory")                        .Append(mlContext.MulticlassClassification.Trainers.TextClassification(labelColumnName: @"RiskCategory", sentence1ColumnName: @"ViolationDescription")) 
.Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName:@"PredictedLabel",inputColumnName:@"PredictedLabel"));
luisquintanilla commented 1 year ago

cc @michaelgsharp

luisquintanilla commented 1 year ago

Model Builder team - Focus on issue in description since that seems to be related to downloading and finding correct TorchSharp DLLs. Ignore the rest of the thread as that seems to be an ML.NET Framework issue when handling missing values.

beccamc commented 1 year ago

@LittleLittleCloud Is this being resolved with the GPU fixes for Sentence Similarity?