fstandhartinger / LightGbmDotNet

A .NET wrapper for the LightGBM machine learning library
MIT License
11 stars 2 forks source link

Error during LightGBM run #2

Open pjsgsy opened 4 months ago

pjsgsy commented 4 months ago

Hi,

I know this is an old project and perhaps gone away, but it was exactly what I was looking for! Downloaded the source and built it OK. Added to project and attempting to train. I added a List<List> as the training data. At the point of .Train(trainigndata), an exception occurs that states I don;t have enough rows, yet when I look at the list of the list<doubles@, they are there. List containing 30k List each 39 doubles in length.

Exception: Cannot construct Dataset since there are not useful features. It should be at least two unique rows. If the num_row (num_data) is small, you can set min_data=1 and min_data_in_bin=1 to fix this. Otherwise please make sure you are using the right dataset.

The List is definitely 30k items, and each item is a List of 39 doubles.

Am I being a complete luddite? A simple code example of usage I could not find...

Any clues?

Thanks!

pjsgsy commented 4 months ago

OK - I figured this out. I changed my temp folder to one I could more easily monitor and saw the .csv was in fact being written, but with all the same record. A bug on my side! So, that issue was resolved. Thank you! For anyone else who is even more of a newbie than me!

lightGbmFF = new LightGbm(false,@"b:\temp");
private List<double[]> lgbmFeatures = new  List<double[]>();
double[] lgbmFeature =  new double[40];
lgbmFeature[0] = (double)dir;
for (uint k=0; k<numFeatures; k++)
lgbmFeature[k+1] = FEATURE[historicalBar][k];   // data (features)
lgbmFeatures.Add(lgbmFeature);
if (lgbmFeatures.Any() )
{
    Print("Training LightGBM with "+lgbmFeatures.Count +" rows");
    lightGbmFF.Train(lgbmFeatures);
    lgbmFeatures = null;
}
lightGbmFF.Dispose();

once again - Thanks for sharing this. Not sure even now, after all these years, if there is anything else for .net 4.8 that allows lightGBM usage.

pjsgsy commented 4 months ago

Further - I can it helps anyone else, given I could not find any code examples for usage (though I guess it is well documented enough code).

For multiclass classification, you will need to pass some parameters to .train, like this

Parameters Param = Parameters.DefaultForMulticlassClassification.Clone(); Param.AddOrReplace(new LightGbmDotNet.Parameter("num_classes", "3")); lightGbmFF.Train(lgbmFeatures, Param);

There is a full list of parameters here

https://lightgbm.readthedocs.io/en/latest/Parameters.html

After a quick look at the code, the same defaults are returned for all types I think, so, you will need to set them. I am sure all this is clear to the author, but, for me, stumbling on this, it was not...

Unfortunately for me, once I have got this far, predictions now fail with an error 'Input string was not in a correct format.', despite the fact it is the same as when it was trying to do binary classification and working OK.

I LightGBM_predict_result.txt file, the correct 3 classes and values are there, such as

0.54171384829490787 0.23963322102351683 0.21865293068157535

So, not sure why this code inside LightGbm.cs fails. If I find a fix, I will share it.

pjsgsy commented 4 months ago

OK - Got it. Seem the code here might not actually support multiclass classification! It is trying to parse the returned (correct) result , but does so with

double.Parse(l, englishCulture)

Yet, the string it is trying to parse is multiclass and reads

"0.20059427917564951\t0.76653742731652563\t0.032868293507824817"

So, it would seem this needs fixing. I can probably do that. If this is not a dead project, please let me know and I will share the code if of interest. if not, perhaps I will fork this for my use.

Thanks!