Closed lawrenceadams closed 1 year ago
Hey Lawrence,
Could you provide some steps to reproduce the issues/errors ? Is the list of packages you've put there the ones where the notebook is working ?
Its definitely good to have uniformity I don't have much of a preference but to say, probably worth justifying conda over PIP if we are to use conda. The value add for conda doesn't seem to be there for the use case.
Thanks @medic-code ,
Unsure exactly where - Chris had the issue, but the offending traceback is here:
prep_df = pd.DataFrame(X_train_prep)
prep_df.columns = pipeline.get_feature_names_out()
prep_df.head()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [57], in ()
1 prep_df = pd.DataFrame(X_train_prep)
----> 2 prep_df.columns = pipeline.get_feature_names_out()
3 prep_df.head()
File [~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:481](https://file+.vscode-resource.vscode-cdn.net/Users/lawrence/Programming/CodingForMedicine/exercises/~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:481), in ColumnTransformer.get_feature_names_out(self, input_features)
479 transformer_with_feature_names_out = []
480 for name, trans, column, _ in self._iter(fitted=True):
--> 481 feature_names_out = self._get_feature_name_out_for_transformer(
482 name, trans, column, input_features
483 )
484 if feature_names_out is None:
485 continue
File [~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:446](https://file+.vscode-resource.vscode-cdn.net/Users/lawrence/Programming/CodingForMedicine/exercises/~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:446), in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
444 # An actual transformer
445 if not hasattr(trans, "get_feature_names_out"):
--> 446 raise AttributeError(
447 f"Transformer {name} (type {type(trans).__name__}) does "
448 "not provide get_feature_names_out."
449 )
450 if isinstance(column, Iterable) and not all(
451 isinstance(col, str) for col in column
452 ):
453 column = _safe_indexing(feature_names_in, column)
AttributeError: Transformer lab (type OrdinalEncoder) does not provide get_feature_names_out.
In reality I think this is fine ~ I can't recreate it, but still think we should have a pinned/clear versions of packages for ease of maintainability/preventing issues for students
@chris-lovejoy Are you still able to reproduce it ?
I think yep definitely worth having pinned versions of packages. Probably worth picking the packages currently being used and making sure the rest of the notebooks work appropriately and sticking with those version numbers.
Sorry for slow response with this.
I'm not getting the error that I was before, but when I ran it just now, I did get the error at the bottom.
I think we should go with conda, as it's more native for jupyter notebooks.
In terms of versions, below is a full list of my current versions.
We probably just need to make a environment.yml file for the repo, and provide instructions on how people can install it for their jupyter notebook environment? Or perhaps we make a file for both pip and conda, with instructions for each?
prep_df = pd.DataFrame(X_train_prep)
prep_df.columns = pipeline.get_feature_names_out()
prep_df.head()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [25], in <cell line: 2>()
1 prep_df = pd.DataFrame(X_train_prep)
----> 2 prep_df.columns = pipeline.get_feature_names_out()
3 prep_df.head()
File ~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:481, in ColumnTransformer.get_feature_names_out(self, input_features)
479 transformer_with_feature_names_out = []
480 for name, trans, column, _ in self._iter(fitted=True):
--> 481 feature_names_out = self._get_feature_name_out_for_transformer(
482 name, trans, column, input_features
483 )
484 if feature_names_out is None:
485 continue
File ~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:446, in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
444 # An actual transformer
445 if not hasattr(trans, "get_feature_names_out"):
--> 446 raise AttributeError(
447 f"Transformer {name} (type {type(trans).__name__}) does "
448 "not provide get_feature_names_out."
449 )
450 if isinstance(column, Iterable) and not all(
451 isinstance(col, str) for col in column
452 ):
453 column = _safe_indexing(feature_names_in, column)
AttributeError: Transformer lab (type OrdinalEncoder) does not provide get_feature_names_out.
have decided to go with pip, as it's somewhat simpler and @V-Sher has written great description for the readme. haven't had any further issues with code not running due to different versions - can re-raise issue if this pops up again
@chris-lovejoy and I have had some issues with getting notebooks to run due to differing
pandas
versions etcConcretely, here are my relevant packages:
Should we pin these to a specific version, and have a prompt to install the right version on each run?
Also are we using
conda
orpip
? If using Colab I imagine this doesn't really matter ~ but good to get uniformity across notebooks