TuxML / size-analysis

Analysis of 125+ Linux configurations (this time for predicting/understanding kernel sizes)
2 stars 1 forks source link

Non-boolean and non-tristate options #12

Open FAMILIAR-project opened 5 years ago

FAMILIAR-project commented 5 years ago

There are a few options (~100) that are neither boolean nor tristate (numerical or strings). We choose to remove them, which is reasonable due to the number of features we already have. Yet we may have missed an opportunity.

@HugoJPMartin can you give the precise list of options that we remove in the first place?

FAMILIAR-project commented 5 years ago

another use case where I need the list: commit 27fb80b

(I wanted to compute the frequency of some options in the dataset, and some options have been removed !? is it due to the removal of options that have unique values?)

HugoJPMartin commented 5 years ago

Added a notebook to explore this and the file with all non tristate options and their possible value in commit 8eadd1c

FAMILIAR-project commented 5 years ago

@HugoJPMartin thanks! can you push here or in https://github.com/TuxML/tuxml-datasets the scripts you used for encoding the data? I need it for encoding data of 4.15 (see #13)