Closed nicolehaugen closed 4 years ago
Hi Nicole. For scenario 4, please open an issue on the LighGBM repository, as the error message is actually coming from their code base.
Summary: Scenario 1 is not reproducible, Scenario 2 and 3 were fixed. Scenario 4 is an error thrown by LightGBM, not ML.NET.
Many exception messages thrown are unclear - as a result, when an exception occurs, it's challenging to identify whether the issue in with the ML.NET code, with the underlying data, with how the algorithm is being applied, etc. Often it takes stepping through the ML.NET fwk in attempt to get further context.
I logged this as a single issue because I think there would be benefit in looking at all places where exceptions are being thrown\rethrown to ensure that default exception messages aren't provided and that the messages are as clear\rich as possible. Let me know if you would like these broken into separate issues rather than having them combined in one.
Here are some specific examples:
[LoadColumn(100)] public uint Label { get; set; }
In the above code, the value of 100 is an invalid index value since the underlying data has less than 100 columns.
[ColumnName("Test"), LoadColumn(135)] public uint Test { get; set; }
var customGains = new LightGbmRankingTrainer.Options(); customGains.CustomGains = new int[] { 0, 1, 2, 3 };IEstimator<ITransformer> trainer = mlContext.Ranking.Trainers.LightGbm(customGains);IEstimator<ITransformer> trainerPipeline = dataPipeline.Append(trainer);
Notice that in the above code, the Group Id isn’t being explicitly set as follows:
customGains.RowGroupColumnName = "GroupId";
var customGains = new LightGbmRankingTrainer.Options(); customGains.CustomGains = new int[] { 0, 1, 2 }; customGains.RowGroupColumnName = "GroupId";
In the underlying data, the relevance label values are: {0, 1, 2, 3, 4 } – in other words, the cardinality of the relevance label values is greater than the specified custom gains.