Open bencoldham opened 5 months ago
Could you please provide a minimal reproducible example with code and data so that I can fix it?
Could you please provide a minimal reproducible example with code and data so that I can fix it?
i have same issue here is how my data looks .
<class 'pandas.core.frame.DataFrame'> RangeIndex: 76368 entries, 0 to 76367 Data columns (total 36 columns):
0 Marital status 76368 non-null int64
1 Application mode 76368 non-null int64
2 Application order 76368 non-null int64
3 Course 76368 non-null int64
4 Daytime/evening attendance 76368 non-null int64
5 Previous qualification 76368 non-null int64
6 Previous qualification (grade) 76368 non-null float64
7 Nacionality 76368 non-null int64
8 Mother's qualification 76368 non-null int64
9 Father's qualification 76368 non-null int64
10 Mother's occupation 76368 non-null int64
11 Father's occupation 76368 non-null int64
12 Admission grade 76368 non-null float64
13 Displaced 76368 non-null int64
14 Educational special needs 76368 non-null int64
15 Debtor 76368 non-null int64
16 Tuition fees up to date 76368 non-null int64
17 Gender 76368 non-null int64
18 Scholarship holder 76368 non-null int64
19 Age at enrollment 76368 non-null int64
20 International 76368 non-null int64
21 Curricular units 1st sem (credited) 76368 non-null int64
22 Curricular units 1st sem (enrolled) 76368 non-null int64
23 Curricular units 1st sem (evaluations) 76368 non-null int64
24 Curricular units 1st sem (approved) 76368 non-null int64
25 Curricular units 1st sem (grade) 76368 non-null float64
26 Curricular units 1st sem (without evaluations) 76368 non-null int64
27 Curricular units 2nd sem (credited) 76368 non-null int64
28 Curricular units 2nd sem (enrolled) 76368 non-null int64
29 Curricular units 2nd sem (evaluations) 76368 non-null int64
30 Curricular units 2nd sem (approved) 76368 non-null int64
31 Curricular units 2nd sem (grade) 76368 non-null float64
32 Curricular units 2nd sem (without evaluations) 76368 non-null int64
33 Unemployment rate 76368 non-null float64
34 Inflation rate 76368 non-null float64
35 GDP 76368 non-null float64
dtypes: float64(7), int64(29)
memory usage: 21.0 MB
the is also a target column that is on it own .
if you need more ifno let know and i will provide it thanks .
here is the full error message : raceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/openfe/utils.py", line 102, in _cal _data = pd.read_feather('./openfe_tmp_data.feather', columns=base_features).set_index('openfe_index') File "/opt/conda/lib/python3.10/site-packages/pandas/io/feather_format.py", line 124, in read_feather return feather.read_feather( File "/opt/conda/lib/python3.10/site-packages/pyarrow/feather.py", line 226, in read_feather return (read_table( File "/opt/conda/lib/python3.10/site-packages/pyarrow/feather.py", line 262, in read_table table = reader.read_names(columns) File "pyarrow/_feather.pyx", line 114, in pyarrow._feather.FeatherReader.read_names File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Field named credited is not found
_RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/openfe/utils.py", line 102, in _cal _data = pd.read_feather('./openfe_tmp_data.feather', columns=base_features).set_index('openfe_index') File "/opt/conda/lib/python3.10/site-packages/pandas/io/feather_format.py", line 124, in read_feather return feather.read_feather( File "/opt/conda/lib/python3.10/site-packages/pyarrow/feather.py", line 226, in read_feather return (read_table( File "/opt/conda/lib/python3.10/site-packages/pyarrow/feather.py", line 262, in read_table table = reader.read_names(columns) File "pyarrow/_feather.pyx", line 114, in pyarrow._feather.FeatherReader.read_names File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Field named credited is not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/opt/conda/lib/python3.10/site-packages/openfe/utils.py", line 111, in _cal exit() NameError: name 'exit' is not defined """
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last) Cell In[62], line 2 1 #transform the train and test data according to generated features. ----> 2 f_x_train, f_x_val = transform(x_train, x_val, features, n_jobs=1)
File /opt/conda/lib/python3.10/site-packages/openfe/utils.py:147, in transform(X_train, X_test, new_features_list, n_jobs, name) 145 cat_feats = [] 146 for i, res in enumerate(results): --> 147 is_cat, d1, d2, f = res.result() 148 names.append('autoFEf%d' % i + name) 149 names_map['autoFEf%d' % i + name] = f
File /opt/conda/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout) 449 raise CancelledError() 450 elif self._state == FINISHED: --> 451 return self.__get_result() 453 self._condition.wait(timeout) 455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
File /opt/conda/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self) 401 if self._exception: 402 try: --> 403 raise self._exception 404 finally: 405 # Break a reference cycle with the exception in self._exception 406 self = None
NameError: name 'exit' is not defined
It seems that the error comes from _data = pd.read_feather('./openfe_tmp_data.feather', columns=base_features).set_index('openfe_index')
. But credited
is not in the columns. Are you trying to run multiple openfe processes in the same machine?
only one openfe process am runing it on kaggle notebook .
The following code produced an error on the transform function. The fit function works correctly. This error is reproduced for every feature in the original dataset.