Having a lot of features + Using LOFO?

aerdem4 / lofo-importance

Leave One Feature Out Importance

MIT License

810 stars 83 forks source link

Having a lot of features + Using LOFO? #34

Closed Mymoza closed 3 years ago

Mymoza commented 3 years ago

Hi,

I have 1673 features. When I tried using LOFO importance, the result is the following:

lofo_importance_result

Are the features showing up one on top of the other because the plot isn't long enough? What would you suggest to fix this problem?

Thank you

aerdem4 commented 3 years ago

I think the issue is indeed with the plot. It is not able to visualize all 1673 features for given figsize. You can try larger figsize but then it may become a very large image. Can you tell me about these 1673 features? If you are able to group them, you can get better results. LOFO Dataset can take a dictionary of feature groups.

Mymoza commented 3 years ago

My features are from tsfresh. The reason I have so many is that we tried different coefficients for these features, so it grows to a high number of features.

Grouping them as one would do using FLOFO? I tried doing that, but my validation set has 434 examples, and so I was not meeting the minimum requirement of 1000 examples. I wanted to change the number of bins to fix that, but it seems it is not an argument easy to give a new value..?

aerdem4 commented 3 years ago

Grouping as in the example on README:

    dataset = Dataset(df=df[df[target].notnull()], target=target, features=loading_features,
                      feature_groups={"fnc": df[df[target].notnull()][fnc_features].values
                      })