TuxML / size-analysis

Analysis of 125+ Linux configurations (this time for predicting/understanding kernel sizes)
2 stars 1 forks source link

Generic load dataset #10

Open FAMILIAR-project opened 5 years ago

FAMILIAR-project commented 5 years ago

Right now, we all have an ad-hoc method for loading the dataset. We need to unify the process. So is here the plan:

FAMILIAR-project commented 5 years ago

I've put some instructions in the README:

FAMILIAR-project commented 5 years ago

Hum... I realize that all_size_withyes.pkl contains options having unique values... We can safely remove them. It can be a problem for multicollinearity (and can make better scale ML algorithms)

I will try to release a new dataset ASAP...

FAMILIAR-project commented 5 years ago

I've released a new version of the dataset. Please update your git (git pull) https://gitlab.com/FAMILIAR-project/tuxml-size-analysis-datasets/

don't be surprise the 'nbyes' differs: it is now computed over features that remain in the dataset (so it's the old nbyes minus the number of features having an unique value)