Closed daholste closed 4 years ago
For now plan is following: Binary classification would support only boolean labels. If your data contains missing values -> load it as float or text and either filter it, or create mapping from this values to boolean. Float to boolean conversion should start work after this PR: https://github.com/dotnet/machinelearning/pull/2804
Text labels, I think we currently support 'True' and 'False' values in text loader as boolean values. For any other stuff like 'Positive', 'Negative', 'Cool', 'Not cool' you right now need to implement custom mapping or ValueMap
Thanks, @Ivanidzo4ka !
Float to boolean conversion should start work after this PR: #2804
Do you have any plans for key to Boolean conversion? This would help from our side
That can be quite tricky. We can convert key to it's original type, but to specific type is feels somewhat weird. Key is basically a runtime build dictionary. It doesn't make much sense for me to cast dictionary which can contain whatever you want to boolean.
Why you need this conversion?
If a dataset has a text label with only 2 values, we want to do something like:
mlContext.Transforms.Conversion.MapValueToKey("Label")
.Append(mlContext.Transforms.Conversion.ConvertType("Label", outputKind: DataKind.Boolean))
.Append(mlContext.BinaryClassification.Trainers.LightGbm())
I noticed that
mlContext.Transforms.Conversion.MapValueToKey("Label")
.Append(mlContext.Transforms.Conversion.ConvertType("Label", outputKind: DataKind.Single))
converts a key type to a float? Is this correct? If so, after your PR (https://github.com/dotnet/machinelearning/pull/2804), could we do something like
mlContext.Transforms.Conversion.MapValueToKey("Label")
.Append(mlContext.Transforms.Conversion.ConvertType("Label", outputKind: DataKind.Single))
.Append(mlContext.Transforms.Conversion.ConvertType("Label", outputKind: DataKind.Boolean))
.Append(mlContext.BinaryClassification.Trainers.LightGbm())
? Does a better way come to mind to transform a text label to a Boolean form (that a binary classification trainer requires)? Thanks for your time!
@daholste Perhaps a custom transform would be in order? You can specify the exact mapping you want. This would let you map user-supplied values to booleans. Like @Ivanidzo4ka said, it's not clear a priori what value(s) would map to true or false.
You can define a custom transform like this:
// Define a custom function.
Action<ClassWithKey, ClassWithBool> convertLabelToBoolean = (input, output) =>
{
output.Label = ConversionLogic(input.Label);
// Copy the rest over too.
};
// Create a pipeline to execute the custom function.
var pipeline = mlContext.Transforms.CustomMapping(convertLabelToBoolean , null);
Close this issue as suggestion has already been given and not hear back from user for more than 1 year. Feel free to reopen if necessary.
@justinormont points out (https://github.com/dotnet/machinelearning-automl/issues/255) :
When the "Label" column is text, calling
results in the exception
Would you have any recommendation for handling these kinds of scenarios?