Closed beccamc closed 2 years ago
@LittleLittleCloud FYI on this issue.
It's something related to feature flag, I'll fix this.
The error 2021-08-03 17:37:55.2373 DEBUG System.FormatException: String ' available trainer: LGBM, RF, FASTTREE, LBFGS, SDCA ' was not recognized as a valid Boolean.
is not fatal. It will just causes FF manager to return false and disable functions.
The fatal error is in data processing, where the dataset's header is determined as "false" by prose while it's true. However even setting --has-header=true
doesn't resolve this. I believe it's because mlnet(or dataProcessing engine) doesn't respect that flag for some reason.
Meanwhile, model builder also can't train that dataset. Firstly, it shows that the dataset has 5 columns while it only has two. And after setting header flag to true, it still shows 5 headers and throws "An item with same key has already been added" somehow. And training also fails.
@beccamc Can you fix this in model builder side and I'll take over mlnet.cli side
@beccamc Still have model builder error: An item with the same key has already been added. on ML.Net Model Builder: 16.9.1.2155901 (Main) Column Headers: No Column Headers: Yes
Not repro this issue on mlnet: 16.9.2
The problem here isn't the spam/ham thing, but that there three empty columns.
Two things to verify...
Verify this issue on latest main: 16.9.1.2160801 For "Column Headers: Yes": Will note that "Cannot have multiple columns with same name. Please rename or remove column."
For "Column Headers: No" and "dataset without extra columns", can complete training.
I'm guessing the problem is that it's "boolean" classification, but using spam/ham instead of true/false or 0/1.
From the Log file: 2021-08-03 17:37:55.2373 DEBUG System.FormatException: String ' available trainer: LGBM, RF, FASTTREE, LBFGS, SDCA ' was not recognized as a valid Boolean. at System.Boolean.Parse(ReadOnlySpan`1 value) at System.Boolean.Parse(String value)
Spam dataset