Package Versions, Package Managers?

lawrenceadams commented 1 year ago

@chris-lovejoy and I have had some issues with getting notebooks to run due to differing pandas versions etc

Concretely, here are my relevant packages:

(coding_for_medicine)  [truncated] > conda list | grep -E 'pandas|numpy|seaborn|matplotlib|imbalanced-learn'
imbalanced-learn          0.9.1                    pypi_0    pypi
matplotlib                3.6.1           py310h2ec42d9_0    conda-forge
matplotlib-base           3.6.1           py310he725631_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
numpy                     1.23.4          py310h1b7c290_0    conda-forge
pandas                    1.5.1           py310hecf8f37_0    conda-forge
seaborn                   0.11.2               hd8ed1ab_0    conda-forge
seaborn-base              0.11.2             pyhd8ed1ab_0    conda-forge

Should we pin these to a specific version, and have a prompt to install the right version on each run?

Also are we using conda or pip? If using Colab I imagine this doesn't really matter ~ but good to get uniformity across notebooks

medic-code commented 1 year ago

Hey Lawrence,

Could you provide some steps to reproduce the issues/errors ? Is the list of packages you've put there the ones where the notebook is working ?

Its definitely good to have uniformity I don't have much of a preference but to say, probably worth justifying conda over PIP if we are to use conda. The value add for conda doesn't seem to be there for the use case.

lawrenceadams commented 1 year ago

Thanks @medic-code ,

Unsure exactly where - Chris had the issue, but the offending traceback is here:

prep_df = pd.DataFrame(X_train_prep)
prep_df.columns = pipeline.get_feature_names_out()
prep_df.head()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [57], in ()
      1 prep_df = pd.DataFrame(X_train_prep)
----> 2 prep_df.columns = pipeline.get_feature_names_out()
      3 prep_df.head()

File [~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:481](https://file+.vscode-resource.vscode-cdn.net/Users/lawrence/Programming/CodingForMedicine/exercises/~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:481), in ColumnTransformer.get_feature_names_out(self, input_features)
    479 transformer_with_feature_names_out = []
    480 for name, trans, column, _ in self._iter(fitted=True):
--> 481     feature_names_out = self._get_feature_name_out_for_transformer(
    482         name, trans, column, input_features
    483     )
    484     if feature_names_out is None:
    485         continue

File [~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:446](https://file+.vscode-resource.vscode-cdn.net/Users/lawrence/Programming/CodingForMedicine/exercises/~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:446), in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
    444 # An actual transformer
    445 if not hasattr(trans, "get_feature_names_out"):
--> 446     raise AttributeError(
    447         f"Transformer {name} (type {type(trans).__name__}) does "
    448         "not provide get_feature_names_out."
    449     )
    450 if isinstance(column, Iterable) and not all(
    451     isinstance(col, str) for col in column
    452 ):
    453     column = _safe_indexing(feature_names_in, column)

AttributeError: Transformer lab (type OrdinalEncoder) does not provide get_feature_names_out.

In reality I think this is fine ~ I can't recreate it, but still think we should have a pinned/clear versions of packages for ease of maintainability/preventing issues for students

medic-code commented 1 year ago

@chris-lovejoy Are you still able to reproduce it ?

I think yep definitely worth having pinned versions of packages. Probably worth picking the packages currently being used and making sure the rest of the notebooks work appropriately and sticking with those version numbers.

chris-lovejoy commented 1 year ago

Sorry for slow response with this.

I'm not getting the error that I was before, but when I ran it just now, I did get the error at the bottom.

I think we should go with conda, as it's more native for jupyter notebooks.

In terms of versions, below is a full list of my current versions.

We probably just need to make a environment.yml file for the repo, and provide instructions on how people can install it for their jupyter notebook environment? Or perhaps we make a file for both pip and conda, with instructions for each?

prep_df = pd.DataFrame(X_train_prep)
prep_df.columns = pipeline.get_feature_names_out()
prep_df.head()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [25], in <cell line: 2>()
      1 prep_df = pd.DataFrame(X_train_prep)
----> 2 prep_df.columns = pipeline.get_feature_names_out()
      3 prep_df.head()

File ~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:481, in ColumnTransformer.get_feature_names_out(self, input_features)
    479 transformer_with_feature_names_out = []
    480 for name, trans, column, _ in self._iter(fitted=True):
--> 481     feature_names_out = self._get_feature_name_out_for_transformer(
    482         name, trans, column, input_features
    483     )
    484     if feature_names_out is None:
    485         continue

File ~/Library/miniconda3/envs/data_science/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py:446, in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
    444 # An actual transformer
    445 if not hasattr(trans, "get_feature_names_out"):
--> 446     raise AttributeError(
    447         f"Transformer {name} (type {type(trans).__name__}) does "
    448         "not provide get_feature_names_out."
    449     )
    450 if isinstance(column, Iterable) and not all(
    451     isinstance(col, str) for col in column
    452 ):
    453     column = _safe_indexing(feature_names_in, column)

AttributeError: Transformer lab (type OrdinalEncoder) does not provide get_feature_names_out.

chris-lovejoy commented 1 year ago

have decided to go with pip, as it's somewhat simpler and @V-Sher has written great description for the readme. haven't had any further issues with code not running due to different versions - can re-raise issue if this pops up again

chris-lovejoy / CodingForMedicine

Package Versions, Package Managers? #12