System Information (please complete the following information):
OS & Version: windows 10
ML.NET Version: 4.0.0-preview.24271.1
.NET Version: 8.0.8
Describe the bug
initializing mlcontext without any seed (new MLContext()) and training on the same data does not actually results in different models created by Regression.Trainers.FastForest() or Regression.Trainers.LightGbm().
To Reproduce
Steps to reproduce the behavior:
initialize mlcontext without a seed.
create a model on data with FastForest.
initialize a different mlcontext without a seed.
create a model on the same exact data with FastForest.
Compare the predictions of both models (in my case, I compared 23645 predictions)
repeat the process with lgbm.
Expected behavior
As I see it, first FastForest predictions should be different then the second FastForest predictions. same in lightgbm.
Even the slightest change in randomness for the bootstrapped dataset selection should end up in different results.
It seems like the FastForest or LightGbm under ml.net are not so random.. :{
Further ResearchPLZ READ ME TOO
I played with LightGbm on python and I was able to introduce randomness into it with feat these params: 'feature_fraction': 0.2,'seed': rand_num. removing one of them removes also the randomness in the results, see code:
WorkArounds
A workaround for lightgbm would be to add FeatureFraction into the params.
sending Seed as a part of FastForestRegressionTrainer.Options is a workaround for FastForest.
System Information (please complete the following information):
Describe the bug initializing mlcontext without any seed (
new MLContext()
) and training on the same data does not actually results in different models created byRegression.Trainers.FastForest()
orRegression.Trainers.LightGbm()
.To Reproduce Steps to reproduce the behavior:
repeat the process with lgbm.
Expected behavior As I see it, first FastForest predictions should be different then the second FastForest predictions. same in lightgbm.
Even the slightest change in randomness for the bootstrapped dataset selection should end up in different results.
It seems like the FastForest or LightGbm under ml.net are not so random.. :{
Further Research PLZ READ ME TOO I played with LightGbm on python and I was able to introduce randomness into it with feat these params:
'feature_fraction': 0.2,'seed': rand_num
. removing one of them removes also the randomness in the results, see code:WorkArounds A workaround for lightgbm would be to add
FeatureFraction
into the params. sendingSeed
as a part ofFastForestRegressionTrainer.Options
is a workaround for FastForest.