dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

decision to throw an exception in here if labelCount is 1 has compromised learning resources #6863

Open wvaughn409 opened 11 months ago

wvaughn409 commented 11 months ago
          So I've just talked with @codemzs and we decided the best option is to simply throw an exception in here if `labelCount` is 1:

https://github.com/dotnet/machinelearning/blob/c4e4263188dccf16903b8f3fea7e65213a69c6e3/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs#L606

Instead of trying to make all the different changes required to support the corner case of having only 1 class represented on the dataset.

Originally posted by @antoniovs1029 in https://github.com/dotnet/machinelearning/issues/4660#issuecomment-574884432

@antoniovs1029 This "corner case" as you phrased it, having only 1 class represented on the dataset has created a myriad of confusion and frustration for eager young minds attempting to break into this interesting space of machine learning within Visual Studio. The .NET ML Model Builder online documentation offers many learning resources, one of the more popular ones being "Tutorial: Train an ML.NET classification model to categorize images" ...found here:

https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification

The tutorial instructs the user to download an "assets.zip" file: https://github.com/dotnet/samples/blob/main/machine-learning/tutorials/TransferLearningTF/image-classifier-assets.zip

..and the inceptionV1 model : https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip

...and then further steps in the tutorial allow the user to select Local (GPU) during the training environment step. After 3 tries of installing the wrong CUDA version containing incorrect binaries, I finally found the correct version only to be greeted with your unfortunate exception during the Train step.

After spending hours compiling the machine learning repo from source, a little gotcha being needing to change the ps1 script to include --configuration and --platform arguments so as to compile successfully without needing Microsoft.ML.CpuMath native dll files, and attempting to circumvent your exception by changing your code, I stumbled upon a much easier win...

When I found this tutorial: https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification-model-builder

I was able to use these assets instead of the original files supplied by the earlier tutorial (not involving Azure/cloud etc), and then was able to train the model successfully and generate a solution using GPU.


My suggestion to resolve this issue is choose one of the following:

  1. Take down the learning resource here as it only creates confusion when users are unable to follow the steps successfully: https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification

  2. Update the aforementioned resource, specifically the downloadable assets.zip and inception model to include a second class on the dataset.

  3. restore the corner-case of allowing a single class to be present.

As it may be tempting to do nothing in this particular scenario, as Microsoft prioritizes learning activities that use Azure cloud resources over on-premises installations, such as local VS 2022 + CUDA Toolkit etc., for monetary gain reaped from cloud subscription fees, I would remind you of Microsoft's commitment to support its entire product line, not just the high dollar solutions, and also of its assurance to the coding community of its desire for transparency and supporting open source technology and low barriers of entry concerning all things machine learning. With the advent of Open AI chatbot being available to use free of charge, I don't think MS can afford to lobotomize its more economy offerings in this way, unless y'all are playing the short game these days. [stepping off soap-box...]

Also, please give me a job (former employee).

-Christian (v-chrisv)