Open Elantonio opened 1 year ago
Can you try "Data Classification" scenario? You are playing with Regression model and that's why you get a numeric result
Can you try "Data Classification" scenario? You are playing with Regression model and that's why you get a numeric result
As it is possible to mark the Label column as Categorical I assumed it had some use. So I marked the Label column as categorical, assuming it would then only choose from the 3 available values(taking the closest one to the regression result). I could round it and get the Category that way. But then what is the use of being able to mark the Label column as Categorical?
Sorry, Maybe I didn't make myself clear... What I mean is you can choose the Data Classification
card on the first Scenario
tab (see below picture). Based on the screenshot you share with us, it looks like you pick the value prediction
card, which uses regression model to fit a numeric label and that's why you are getting non-catagorical prediction value
Hi,
You said it very clear. The point I was making is that, in my logic, in regression a value marked as categorical and having only 3 values should be predicted as categorical, thus one of the 3 values. (Taking the available values in the train set as being the categories). Otherwise there is no use/effect in declaring the predict column as category. And is it is of no use/effect it should not be possible to set the predict column as category.
Thanks for your effort and thinking with me!
Best Rgds, Ton
From: Xiaoyun Zhang @.*** Sent: zaterdag 2 september 2023 21:01 To: dotnet/machinelearning-modelbuilder Cc: ElAntonio; Author Subject: Re: [dotnet/machinelearning-modelbuilder] Categorical prediction handled as non-catagorical. (Issue #2763)
Sorry, Maybe I didn't make myself clear... What I mean is you can choose the Data Classification card on the first Scenario tab (see below picture). Based on the screenshot you share with us, it looks like you pick the value prediction card, which uses regression model to fit a numeric label and that's why you are getting non-catagorical prediction value
https://user-images.githubusercontent.com/16876986/265224416-ebbdfe4a-156f-4f5d-92ec-c7dc9478e424.png Image removed by sender. image
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-modelbuilder/issues/2763#issuecomment-1703913487 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZBS7CI57CB34MS4ZAHALXYN645ANCNFSM6AAAAAA36X3QRM . You are receiving this because you authored the thread.Image removed by sender.Message ID: @.***>
Sorry to bother you but do you know where i can report a bug?
BUG:
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task
1.get_Result() at Microsoft.ML.AutoML.AutoMLExperiment.Run() at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress
1 progressHandler) at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, String labelColumnName, String samplingKeyColumn, IEstimator1 preFeaturizer, IProgress
1 progressHandler) at cAlgo.Predictor.<>c__DisplayClass12_0.
Source:
public async Task TrainAsync( )
{
IsTrained = false;
// Extract header row
try
{
var FileSizeMB = (double)(new FileInfo(TrainData.DataPath).Length) / 1024 / 1024;
var TrainTimeInSeconds =(uint) Math.Min(10,( FileSizeMB *FileSizeMB * TrainTimeFactor / 100*60).RoundUp(0));
ColumnInferenceResults columnInference =
MyContext.Auto().InferColumns(TrainData.DataPath, labelColumnName: LabelName, groupColumns: false);
foreach (var name in TrainData.CategoricalColumnNames)
{
columnInference.ColumnInformation.NumericColumnNames.Remove(name);
columnInference.ColumnInformation.CategoricalColumnNames.Add(name);
}
// Create IDataView from data
TextLoader loader = MyContext.Data.CreateTextLoader(columnInference.TextLoaderOptions);
IDataView DataView = loader.Load(TrainData.DataPath);
DataView = MyContext.Data.ShuffleRows(DataView);
// Define experiment settings
var experimentSettings = new RegressionExperimentSettings
{
MaxExperimentTimeInSeconds = TrainTimeInSeconds,
OptimizingMetric = RegressionMetric.RSquared,
CacheBeforeTrainer = CacheBeforeTrainer.Auto
};
// Create experiment
var experiment = MyContext.Auto().CreateRegressionExperiment(experimentSettings);
var progressHandler = new Progress<RunDetail<RegressionMetrics>>(p =>
{
DebugWrite($"Current result - TrainerName: {p.TrainerName}, RuntimeInSeconds: {p.RuntimeInSeconds}, ValidationMetrics: {p.ValidationMetrics}");
});
// Run experiment
(ERRORLINE 94) var result = Task.Run(() => experiment.Execute(DataView, labelColumnName: LabelName, progressHandler: progressHandler));
// Get best model
var model = result.Result.BestRun.Model;
RSquared = result.Result.BestRun.ValidationMetrics.RSquared;
// Create prediction engine
PredictionEngine = MyContext.Model.CreatePredictionEngine<dynamic, ModelOutput>(model);
IsTrained = true;
}
catch (Exception ex) { System.Diagnostics.Debug.WriteLine(ex.Message + " " + ex.StackTrace.ToString()); }
}
From: Xiaoyun Zhang @.*** Sent: zaterdag 2 september 2023 21:01 To: dotnet/machinelearning-modelbuilder Cc: ElAntonio; Author Subject: Re: [dotnet/machinelearning-modelbuilder] Categorical prediction handled as non-catagorical. (Issue #2763)
Sorry, Maybe I didn't make myself clear... What I mean is you can choose the Data Classification card on the first Scenario tab (see below picture). Based on the screenshot you share with us, it looks like you pick the value prediction card, which uses regression model to fit a numeric label and that's why you are getting non-catagorical prediction value
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-modelbuilder/issues/2763#issuecomment-1703913487 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZBS7CI57CB34MS4ZAHALXYN645ANCNFSM6AAAAAA36X3QRM . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ABYZBS76TUEMCOJGEDQQNWDXYN645A5CNFSM6AAAAAA36X3QROWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTFR6UA6.gif Message ID: @.***>
You can report it here. What's the exception message you get? Is that something related to non-numeric label value for regression trainers?
No it is not related to non-numeric label value for regression trainers; that I would not have perceived as a bug.
Sorry i forgot to mention the exception that (stupid) Here it is: NullReferenceException: Object reference not set to an instance of an object.
Best rgds, Ton
From: Xiaoyun Zhang @.*** Sent: maandag 4 september 2023 04:56 To: dotnet/machinelearning-modelbuilder Cc: ElAntonio; Author Subject: Re: [dotnet/machinelearning-modelbuilder] Categorical prediction handled as non-catagorical. (Issue #2763)
You can report it here. What's the exception message you get? Is that something related to non-numeric label value for regression trainers?
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-modelbuilder/issues/2763#issuecomment-1704540662 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZBS335YAMZT5IVJL7VH3XYU7K5ANCNFSM6AAAAAA36X3QRM . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ABYZBS6AAU43WEOK7EJGVD3XYU7K5A5CNFSM6AAAAAA36X3QROWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTFTE47M.gif Message ID: @.***>
That bug sounds so familiar to me. Could it be similar to this issue? https://github.com/dotnet/machinelearning/issues/6558
There are similarities
In my also running the experiment from the modelbuilder on csv works OK
There are differences
I’m running ML 2.0.1 and AutoML 0.20.1
But nevertheless I tried the fix (specifying TrainSet & TestSet)
and now …….
IT WORKS !!
Thanks a very big lot! although it stays a bug ;-)
Best rgds,
Ton
From: Xiaoyun Zhang @.*** Sent: dinsdag 5 september 2023 19:53 To: dotnet/machinelearning-modelbuilder Cc: ElAntonio; Author Subject: Re: [dotnet/machinelearning-modelbuilder] Categorical prediction handled as non-catagorical. (Issue #2763)
That bug sounds so familiar to me. Could it be similar to this issue? dotnet/machinelearning#6558 https://github.com/dotnet/machinelearning/issues/6558
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-modelbuilder/issues/2763#issuecomment-1707053009 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZBSYE5DFXIRTYUKHMVPTXY5RHNANCNFSM6AAAAAA36X3QRM . You are receiving this because you authored the thread.https://github.com/notifications/beacon/ABYZBS7QFUMBEOFMXX6MV7DXY5RHNA5CNFSM6AAAAAA36X3QROWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTFX6H5C.gifMessage ID: @.***>
Cool, glad to see you figure this out!
Hi Xiaoyun,
I was glad it worked too, but a little later it went haywire again!
System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=Microsoft.ML.AutoML
StackTrace:
at Microsoft.ML.AutoML.SweepablePipeline..ctor(Dictionary`2 estimators, Entity schema, String currentSchema)
at Microsoft.ML.AutoML.SweepablePipeline.AppendEntity(Boolean allowSkip, Entity entity)
at Microsoft.ML.AutoML.RegressionExperiment.CreateRegressionPipeline(IDataView trainData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer)
at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress
1 progressHandler)
at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, String labelColumnName, IEstimator1 preFeaturizer, IProgress
1 progressHandler)
at cAlgo.Predictor.
on the following code:
public async Task<(double Rsquared, bool IsTrained, PredictionEngine<ExpandoObject, ModelOutput> PredictionEngine)> Train2Async()
{
try
{ // Extract header row
var FileSizeMB = (double)(new FileInfo(TrainData.DataPath).Length) / 1024 / 1024;
var TrainTimeInSeconds = (uint)Math.Max(10, Math.Sqrt(FileSizeMB) * TrainTimeFactor * 100).RoundUp(0);
var MyContext = new MLContext();
ColumnInferenceResults columnInference =
MyContext.Auto().InferColumns(TrainData.DataPath, labelColumnName: LabelName, groupColumns: false);
foreach (var name in TrainData.CategoricalColumnNames)
{
columnInference.ColumnInformation.NumericColumnNames.Remove(name);
columnInference.ColumnInformation.CategoricalColumnNames.Add(name);
}
// Load data
var data = MakeExpando();
// Convert data to IDataView
var dataView = MyContext.Data.LoadFromEnumerable(data);
// Split data into training and test sets
var trainTestSplit = MyContext.Data.TrainTestSplit(dataView);
// Define experiment settings
var experimentSettings = new RegressionExperimentSettings
{
MaxExperimentTimeInSeconds = 60,
OptimizingMetric = RegressionMetric.RSquared,
CacheBeforeTrainer = CacheBeforeTrainer.Auto
};
// Create experiment
var experiment = MyContext.Auto().CreateRegressionExperiment(experimentSettings);
// Run experiment
ERRORLINE=> var result = experiment.Execute(trainTestSplit.TrainSet, trainTestSplit.TestSet);
// Get best model
var model = result.BestRun.Model;
//
RSquared = result.BestRun.ValidationMetrics.RSquared;//.result.Result.BestRun.ValidationMetrics.RSquared;
// Get feature importance
var featureImportance = MyContext.Regression.PermutationFeatureImportance(model, trainTestSplit.TestSet);
var featureImportanceValues = featureImportance.Select(x => x.Value.RSquared.Mean).ToArray();
var featureNames = data.First().Select(x => x.Key).ToArray();
// Print feature importance
for (int i = 0; i < featureNames.Length; i++)
{
Console.WriteLine($"{featureNames[i]}: {featureImportanceValues[i]:0.00}");
}
// Create prediction engine
var predictionEngine = MyContext.Model.CreatePredictionEngine<ExpandoObject, ModelOutput>(model);
return (RSquared, IsTrained, PredictionEngine);
}
catch (Exception ex) { Lib.LogPrint(ThisRobot, ex.Message + " " + ex.StackTrace.ToString()); }
return (double.NaN, false, null);
}
List<ExpandoObject> MakeExpando()
{
var data = new List<ExpandoObject>();
// Read header row
var header = TrainData.HeaderCollection;
// Read data rows
foreach (var row in TrainData.Rows)
{
var line = row.ToArray();
dynamic dataPoint = new ExpandoObject();
for (int i = 0; i < header.Length; i++)
{
((IDictionary<string, object>)dataPoint)[header[i]] = (float)line[i];
}
data.Add(dataPoint);
}
return data;
}
What am I doing Wrong here? I just want to do an AutoML run on a dataset that has no defined columns at instantiation.
Only after loading the csv the columns and their header are known (similar to mbconfig)
Best Rgds,
Ton
From: Xiaoyun Zhang @.*** Sent: woensdag 6 september 2023 19:24 To: dotnet/machinelearning-modelbuilder Cc: ElAntonio; Author Subject: Re: [dotnet/machinelearning-modelbuilder] Categorical prediction handled as non-catagorical. (Issue #2763)
Cool, glad to see you figure this out!
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-modelbuilder/issues/2763#issuecomment-1708802643 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZBS6KSQHGN4TXGKUZGADXZCWT5ANCNFSM6AAAAAA36X3QRM . You are receiving this because you authored the thread. https://github.com/notifications/beacon/ABYZBS56SO3KS4F3ZVKIWQTXZCWT5A5CNFSM6AAAAAA36X3QROWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTF3JBFG.gif Message ID: @.***>
@Elantonio Would you still get the error if you remove the following lines?
foreach (var name in TrainData.CategoricalColumnNames)
{
columnInference.ColumnInformation.NumericColumnNames.Remove(name);
columnInference.ColumnInformation.CategoricalColumnNames.Add(name);
}
Hi Xiaoyun,
Sorry no avail, error is still the same System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=Microsoft.ML.AutoML
StackTrace:
at Microsoft.ML.AutoML.SweepablePipeline..ctor(Dictionary`2 estimators, Entity schema, String currentSchema)
at Microsoft.ML.AutoML.SweepablePipeline.AppendEntity(Boolean allowSkip, Entity entity)
at Microsoft.ML.AutoML.RegressionExperiment.CreateRegressionPipeline(IDataView trainData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer)
at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress
1 progressHandler)
at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, String labelColumnName, IEstimator1 preFeaturizer, IProgress
1 progressHandler)
Best rgds, Ton.
PS I feel I’m missing a clue on how to train dynamic input and reusing its predictionengine and having columnimportance info. do you know of a sample where this has been done before?
From: Xiaoyun Zhang @.*** Sent: vrijdag 8 september 2023 00:39 To: dotnet/machinelearning-modelbuilder Cc: ElAntonio; Mention Subject: Re: [dotnet/machinelearning-modelbuilder] Categorical prediction handled as non-catagorical. (Issue #2763)
@Elantonio https://github.com/Elantonio Would you still get the error if you remove the following lines?
foreach (var name in TrainData.CategoricalColumnNames)
{
columnInference.ColumnInformation.NumericColumnNames.Remove(name);
columnInference.ColumnInformation.CategoricalColumnNames.Add(name);
}
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning-modelbuilder/issues/2763#issuecomment-1710849795 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZBS5PHRJRS3ER26BGKK3XZJEJXANCNFSM6AAAAAA36X3QRM . You are receiving this because you were mentioned. https://github.com/notifications/beacon/ABYZBSZUUHKWHWSARB3JAODXZJEJXA5CNFSM6AAAAAA36X3QROWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTF7F7QG.gif Message ID: @.***>
Scenario Value Prediction
Data with a categorical column2predict (values -1,0,1)
Evaluate gives prediction -0.01 !?!?
Prediction should be -1, 0 or 1!