dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

Incorrect throwing during data loading #2389

Open kant2002 opened 5 years ago

kant2002 commented 5 years ago

System information

Issue

Let's say we do loading of data from CSV file using simple POCO class and forget to add LoadColumn attribute on the properties. Then call to CreateTextLoader<T>/CreateTextReader<T> fails with NullReferenceException

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.ML.Data.TextLoader.CreateTextReader[TInput](IHostEnvironment host, Boolean hasHeader, Char separator, Boolean allowQuotedStrings, Boolean supportSparse, Boolean trimWhitespace)
   at MLConsoleApp1.Program.Main(String[] args) in MLConsoleApp1\Program.cs:line 54

which definitely not user friendly. I track down that to line https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs#L1344

where you delegate assertion to IHostEnvironment since I running LocalEnvironment, I believe that by default when running in Local environment proper default behavior would be just throw. Right now I could not even imaging that such big usability mistake was made by MS, so I have to manually clone project and compile it locally to track down this error.

Source code / logs

public class SentimentRow
{
    public bool Sentiment { get; set; }

    public string SentimentText { get; set; }
}
...
var mlContext = new MLContext();
var reader = mlContext.Data.CreateTextLoader<SentimentRow>(hasHeader: true);
kant2002 commented 5 years ago

Small note: that in Debug configuration exception produce correct behavoir, but in Release mode no exceptions is produced on assertion.

towerofpower256 commented 5 years ago

I appear to be having a similar issue.

The file itself exists, I've put this below and it passes without issue. if (!File.Exists(filename)) throw new Exception("The data file doesn't exist.");

When calling the below code, it throws a NullReferenceException with the following stack trace. var dataView = mlContext.Data.LoadFromTextFile<WineQualityData>(filename, separatorChar: ';', hasHeader: true);

System.NullReferenceException HResult=0x80004003 Message=Object reference not set to an instance of an object. Source=Microsoft.ML.Data StackTrace: at Microsoft.ML.Data.TextLoader.CreateTextLoader[TInput](IHostEnvironment host, Boolean hasHeader, Char separator, Boolean allowQuoting, Boolean supportSparse, Boolean trimWhitespace, IMultiStreamSource dataSample) at Microsoft.ML.TextLoaderSaverCatalog.LoadFromTextFile[TInput](DataOperationsCatalog catalog, String path, Char separatorChar, Boolean hasHeader, Boolean allowQuoting, Boolean trimWhitespace, Boolean allowSparse) at TestMSML.Program.DoThing(String filename) in ...

If it helps, I'm working on a .NET Framework 4.6.1 console application (not .NET Core).

Any thoughts on a workaround?

towerofpower256 commented 5 years ago

Some more info. It works if you create the text loader and define the data structure manually instead of using a class.

Using a class, doesn't work.

var textReader = mlContext.Data.CreateTextLoader<WineQualityData>(separatorChar: ';', hasHeader: true);

Creating the structure manually, does work.

var textLoader = mlContext.Data.CreateTextLoader(new TextLoader.Column[] {
    new TextLoader.Column("fixed acidity", DataKind.Single, 0),
    new TextLoader.Column("volatile acidity", DataKind.Single, 1),
    ...
    new TextLoader.Column("quality", DataKind.Single, 9),
}, separatorChar: ';', hasHeader: true );
memsido commented 5 years ago

Make sure you have fields mapping for the training data-struc:

public class SentimentRow
{
    [LoadColumn(0)]
    public bool Sentiment { get; set; }

    [LoadColumn(1)]
    public string SentimentText { get; set; }
}