cloudbopper / anamod

Feature Importance Analysis of Models
MIT License
5 stars 4 forks source link

Assert clause prevents finding important features. #9

Open franciscomalveiro opened 1 year ago

franciscomalveiro commented 1 year ago

Hello! First of all, thank you for the development of this project. I've been (trying to) use it to extract feature importance of LSTM and Transformer models, but I've stumbled on an assert clause that stops the process.

More specifically:

When no features are considered important, it all goes smoothly, displaying on the terminal: No important features identified, skipping window feature importance window visualization.

However, when the framework detects important features, it hits an assert clause, this one: https://github.com/cloudbopper/anamod/blob/556e2478517c9bd71db2f4c40990d1f6edfa1623/anamod/core/perturbations.py#L102

perturbed_slice.base is None, where, according to the assert clause, it should be, as it indicates, X_hat. For that reason, the process stops.

Following the guidelines you provide at Contributions:

To try to find out the issue (it could reside on my data, models or model wrapper), I've tried to replicate the problem solely with the functions you provide:

import anamod, synmod
output_dir = '.'
num_instances = 222
num_features = 6
fraction_relevant_features = 0.9 # 0.1 works, 0.9 blows
sequence_length = 4

synthesized_features, X, model = synmod.synthesize(output_dir=output_dir, num_instances=num_instances, seed=100,
                                                    num_features=num_features, fraction_relevant_features=fraction_relevant_features,
                                                    synthesis_type='temporal', sequence_length=sequence_length, model_type='classifier')

y = model.predict(X, labels=True)

importance_level = 0.1
output_dir = '.'
loss_function = 'binary_cross_entropy'
feature_names = ['A', 'B', 'C', 'D', 'E', 'F']
explainer = anamod.TemporalModelAnalyzer(
                        model, 
                        X, 
                        y,
                        output_dir=output_dir, 
                        loss_function=loss_function,
                        feature_names=feature_names,
                        importance_significance_level=importance_level,
                        visualize=True
                            )

explainer.analyze()

Changing the value of fraction_relevant_features toggles between working and not working:

PS: To check whether that would be the single problem in the process, I have tried commenting that assert clause. The process then finishes, but displaying a «deformed» plot .

feature_importance_windows

However the results may not be correct ones (the assert was there probably for a reason...) It would be nice if the plot size was adjusted accordingly, or to be set beforehand, to avoid this.

Thanks!

franciscomalveiro commented 1 year ago

I took a look at the source code, and found the following:

I've noticed the comment you have left there, so maybe there is something missing in the implementation...?