Closed ebachelet closed 1 year ago
Setting impute=True when instantiating the Classifier should deal with this issue. The NaN values are due to improper feature weighting (apply_weights=True by default when creating the training set), this I am looking into now
Indentified the following features where applying weights yielded nan due to improper division: 'FluxPercentileRatioMid20', 'FluxPercentileRatioMid35', 'FluxPercentileRatioMid50', 'FluxPercentileRatioMid65', 'FluxPercentileRatioMid80',
The following features were not indexing correctly: 'Gskew', 'MaxSlope', 'mean_second_derivative', 'permutation_entropy', 'time_reversal_asymmetry'
Following feature was missing apply_weights argument: 'stetsonL'
Source code updated, will be included in version 2.2.7
I occur to this problem after generating training set.
model = ensemble_model.Classifier(data_x, data_y, optimize=True, boruta_trials=25, n_iter=25) model.create() Running feature selection... Boruta with Shapley values failed, switching to original Boruta... Running feature selection... *** ValueError: Input X contains NaN.
Should I mask any rows with Nan values? Actually all of the rows have at least one nan values...