IIIS-Li-Group / OpenFE

OpenFE: automated feature generation with expert-level performance
MIT License
781 stars 99 forks source link

IndexError: string index out of range #14

Closed xhaoss closed 1 year ago

xhaoss commented 1 year ago

IndexError Traceback (most recent call last) /tmp/ipykernel_27/17403233.py in 2 3 ofe = openfe() ----> 4 features = ofe.fit(data=train_x, label=train_y, n_jobs=10) # generate new features 5 train_x, test_x = transform(train_x, test_x, features, n_jobs=10) # transform the train and test data according to generated features.

/opt/conda/lib/python3.7/site-packages/openfe/openfe.py in fit(self, data, label, task, train_index, val_index, candidate_features_list, init_scores, categorical_features, metric, drop_columns, n_data_blocks, min_candidate_features, feature_boosting, stage1_metric, stage2_metric, stage2_params, is_stage1, n_repeats, tmp_save_path, n_jobs, seed, verbose) 300 self.myprint(f"The number of remaining candidate features is {len(self.candidate_features_list)}") 301 self.myprint("Start stage II selection.") --> 302 self.new_features_scores_list = self.stage2_select() 303 self.new_featureslist = [feature for feature, in self.new_features_scores_list] 304 for node, score in self.new_features_scores_list:

/opt/conda/lib/python3.7/site-packages/openfe/openfe.py in stage2_select(self) 529 if self.stage2_metric == 'gain_importance': 530 for i, imp in enumerate(gbm.featureimportances[:len(new_features)]): --> 531 results.append([formula_to_tree(new_features[i]), imp]) 532 elif self.stage2_metric == 'permutation': 533 r = permutation_importance(gbm, val_x, val_y,

/opt/conda/lib/python3.7/site-packages/openfe/utils.py in formula_to_tree(string) 52 p1 = find_prev(string[:p2-1]) 53 if string[0] == '(': ---> 54 return Node(string[p2-1], [formula_to_tree(string[p1:p2 - 1]), formula_to_tree(string[p2:-1])]) 55 else: 56 return Node(string[:p1-1], [formula_to_tree(string[p1:p2 - 1]), formula_to_tree(string[p2:-1])])

/opt/conda/lib/python3.7/site-packages/openfe/utils.py in formula_to_tree(string) 28 29 def formula_to_tree(string): ---> 30 if string[-1] != ')': 31 return FNode(string) 32

IndexError: string index out of range

ZhangTP1996 commented 1 year ago

Could you please provide the string that causes this error?

AustinCheang commented 1 year ago

Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running formula_to_tree. This is the output

feature: Combine(Dependents_3+,Property_Area_Semiurban)
current string: Combine(Dependents_3+,Property_Area_Semiurban
current string: Combine(Dependents_3+
p1: 21
p2: 22
ZhangTP1996 commented 1 year ago

Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running formula_to_tree. This is the output

feature: Combine(Dependents_3+,Property_Area_Semiurban)
current string: Combine(Dependents_3+,Property_Area_Semiurban
current string: Combine(Dependents_3+
p1: 21
p2: 22

This is because the formula_to_tree function cannot distinguish between the + in the feature name and the + as an operator. You can replace + in the feature name into values other than +-*/.

AustinCheang commented 1 year ago

Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running formula_to_tree. This is the output

feature: Combine(Dependents_3+,Property_Area_Semiurban)
current string: Combine(Dependents_3+,Property_Area_Semiurban
current string: Combine(Dependents_3+
p1: 21
p2: 22

This is because the formula_to_tree function cannot distinguish between the + in the feature name and the + as an operator. You can replace + in the feature name into values other than +-*/.

Is there a way I could resolve? This is one of the features automatically generated by the library and I did not impose addition feature set.

ZhangTP1996 commented 1 year ago

Hi, I have encounter the same issue. I have printed the generated feature, p1 and p2 for running formula_to_tree. This is the output

feature: Combine(Dependents_3+,Property_Area_Semiurban)
current string: Combine(Dependents_3+,Property_Area_Semiurban
current string: Combine(Dependents_3+
p1: 21
p2: 22

This is because the formula_to_tree function cannot distinguish between the + in the feature name and the + as an operator. You can replace + in the feature name into values other than +-*/.

Is there a way I could resolve? This is one of the features automatically generated by the library and I did not impose addition feature set.

The simplest resolution is to replace the feature name Dependents_3+ by Dependents_3 (remove + in the original features' name).