anthony-wang / BestPractices

Things that you should (and should not) do in your Materials Informatics research.
https://doi.org/10.1021/acs.chemmater.0c01907
MIT License
172 stars 74 forks source link

pandas-profiling Profile Report #5

Closed sgbaird closed 3 years ago

sgbaird commented 4 years ago

I've been getting an error with 1-data_loading_cleanup_processing.ipynb: In:

profile = ProfileReport(df.copy(), title='Pandas Profiling Report of Cp dataset', html={'style':{'full_width':True}})
profile.to_widgets()

Out:

KeyError: 'Requested level (var1) does not match index name (None)'

Upgrading the pandas-profiling version to 2.8.0 via pip install pandas-profiling --upgrade seems to have fixed this.

Note: before upgrading, pip freeze gives pandas-profiling==2.4.0.

Full trace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-5-1fa856a2164e> in <module>
----> 1 profile = ProfileReport(df.copy(), title='Pandas Profiling Report of Cp dataset', html={'style':{'full_width':True}})
      2 profile.to_widgets()

~\anaconda3\envs\bestpractices\lib\site-packages\pandas_profiling\__init__.py in __init__(self, df, minimal, config_file, **kwargs)
     67 
     68         # Get dataset statistics
---> 69         description_set = describe_df(df)
     70 
     71         # Build report structure

~\anaconda3\envs\bestpractices\lib\site-packages\pandas_profiling\model\describe.py in describe(df)
    535 
    536     # Get correlations
--> 537     correlations = calculate_correlations(df, variables)
    538 
    539     # Transform the series_description in a DataFrame

~\anaconda3\envs\bestpractices\lib\site-packages\pandas_profiling\model\correlations.py in calculate_correlations(df, variables)
    191                     # Get the Phi_k sorted order
    192                     current_order = (
--> 193                         correlations["phi_k"].index.get_level_values("var1").tolist()
    194                     )
    195 

~\anaconda3\envs\bestpractices\lib\site-packages\pandas\core\indexes\base.py in _get_level_values(self, level)
   1477         Index(['a', 'b', 'c'], dtype='object')
   1478         """
-> 1479         self._validate_index_level(level)
   1480         return self
   1481 

~\anaconda3\envs\bestpractices\lib\site-packages\pandas\core\indexes\base.py in _validate_index_level(self, level)
   1414         elif level != self.name:
   1415             raise KeyError(
-> 1416                 f"Requested level ({level}) does not match index name ({self.name})"
   1417             )
   1418 

KeyError: 'Requested level (var1) does not match index name (None)'
anthony-wang commented 3 years ago

Hey @sgbaird, thanks for noticing this, and sorry for the late response! I will look into this, and if necessary I will update the environment file 🙂

anthony-wang commented 3 years ago

I just took a look, and it worked for me from a fresh pull of the master branch. My pandas_profiling version is still at 2.4.0:

print(pandas_profiling.__version__)

2.4.0

Did you create your Python environment using anaconda and the steps as described in readme? Or did you use pip? @sgbaird

anthony-wang commented 3 years ago

It's been a month and I haven't gotten a response regarding this issue. So I'll close this issue for now. Let me know if you still face the problem! @sgbaird