NaN values in training set

Professor-G / MicroLIA

Gravitational microlensing classification engine using machine learning

GNU General Public License v3.0

12 stars 6 forks source link

NaN values in training set #18

Closed ebachelet closed 1 year ago

ebachelet commented 1 year ago

I occur to this problem after generating training set.

model = ensemble_model.Classifier(data_x, data_y, optimize=True, boruta_trials=25, n_iter=25) model.create() Running feature selection... Boruta with Shapley values failed, switching to original Boruta... Running feature selection... *** ValueError: Input X contains NaN.

Should I mask any rows with Nan values? Actually all of the rows have at least one nan values...

Professor-G commented 1 year ago

Setting impute=True when instantiating the Classifier should deal with this issue. The NaN values are due to improper feature weighting (apply_weights=True by default when creating the training set), this I am looking into now

Professor-G commented 1 year ago

Indentified the following features where applying weights yielded nan due to improper division: 'FluxPercentileRatioMid20', 'FluxPercentileRatioMid35', 'FluxPercentileRatioMid50', 'FluxPercentileRatioMid65', 'FluxPercentileRatioMid80',

The following features were not indexing correctly: 'Gskew', 'MaxSlope', 'mean_second_derivative', 'permutation_entropy', 'time_reversal_asymmetry'

Following feature was missing apply_weights argument: 'stetsonL'

Source code updated, will be included in version 2.2.7