Closed xhaoss closed 1 year ago
Could you please provide the string
that causes this error?
Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running formula_to_tree
.
This is the output
feature: Combine(Dependents_3+,Property_Area_Semiurban)
current string: Combine(Dependents_3+,Property_Area_Semiurban
current string: Combine(Dependents_3+
p1: 21
p2: 22
Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running
formula_to_tree
. This is the outputfeature: Combine(Dependents_3+,Property_Area_Semiurban) current string: Combine(Dependents_3+,Property_Area_Semiurban current string: Combine(Dependents_3+ p1: 21 p2: 22
This is because the formula_to_tree
function cannot distinguish between the +
in the feature name and the +
as an operator. You can replace +
in the feature name into values other than +-*/
.
Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running
formula_to_tree
. This is the outputfeature: Combine(Dependents_3+,Property_Area_Semiurban) current string: Combine(Dependents_3+,Property_Area_Semiurban current string: Combine(Dependents_3+ p1: 21 p2: 22
This is because the
formula_to_tree
function cannot distinguish between the+
in the feature name and the+
as an operator. You can replace+
in the feature name into values other than+-*/
.
Is there a way I could resolve? This is one of the features automatically generated by the library and I did not impose addition feature set.
Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running
formula_to_tree
. This is the outputfeature: Combine(Dependents_3+,Property_Area_Semiurban) current string: Combine(Dependents_3+,Property_Area_Semiurban current string: Combine(Dependents_3+ p1: 21 p2: 22
This is because the
formula_to_tree
function cannot distinguish between the+
in the feature name and the+
as an operator. You can replace+
in the feature name into values other than+-*/
.Is there a way I could resolve? This is one of the features automatically generated by the library and I did not impose addition feature set.
The simplest resolution is to replace the feature name Dependents_3+
by Dependents_3
(remove +
in the original features' name).
IndexError Traceback (most recent call last) /tmp/ipykernel_27/17403233.py in
2
3 ofe = openfe()
----> 4 features = ofe.fit(data=train_x, label=train_y, n_jobs=10) # generate new features
5 train_x, test_x = transform(train_x, test_x, features, n_jobs=10) # transform the train and test data according to generated features.
/opt/conda/lib/python3.7/site-packages/openfe/openfe.py in fit(self, data, label, task, train_index, val_index, candidate_features_list, init_scores, categorical_features, metric, drop_columns, n_data_blocks, min_candidate_features, feature_boosting, stage1_metric, stage2_metric, stage2_params, is_stage1, n_repeats, tmp_save_path, n_jobs, seed, verbose) 300 self.myprint(f"The number of remaining candidate features is {len(self.candidate_features_list)}") 301 self.myprint("Start stage II selection.") --> 302 self.new_features_scores_list = self.stage2_select() 303 self.new_featureslist = [feature for feature, in self.new_features_scores_list] 304 for node, score in self.new_features_scores_list:
/opt/conda/lib/python3.7/site-packages/openfe/openfe.py in stage2_select(self) 529 if self.stage2_metric == 'gain_importance': 530 for i, imp in enumerate(gbm.featureimportances[:len(new_features)]): --> 531 results.append([formula_to_tree(new_features[i]), imp]) 532 elif self.stage2_metric == 'permutation': 533 r = permutation_importance(gbm, val_x, val_y,
/opt/conda/lib/python3.7/site-packages/openfe/utils.py in formula_to_tree(string) 52 p1 = find_prev(string[:p2-1]) 53 if string[0] == '(': ---> 54 return Node(string[p2-1], [formula_to_tree(string[p1:p2 - 1]), formula_to_tree(string[p2:-1])]) 55 else: 56 return Node(string[:p1-1], [formula_to_tree(string[p1:p2 - 1]), formula_to_tree(string[p2:-1])])
/opt/conda/lib/python3.7/site-packages/openfe/utils.py in formula_to_tree(string) 28 29 def formula_to_tree(string): ---> 30 if string[-1] != ')': 31 return FNode(string) 32
IndexError: string index out of range