VlachosGroup / pMuTT

Python Multiscale Thermochemistry Toolbox (pMuTT)
https://vlachosgroup.github.io/pMuTT/
40 stars 23 forks source link

xlrd, used by Pandas to read Excel files no longer supports .xlsx Excel workbook files #183

Open wittregr opened 3 years ago

wittregr commented 3 years ago

Version of pMuTT pmutt 1.2.21

Describe the bug Version 2.0+ of xlrd no longer supports reading Excel .xlxs files. This is the default Excel workbook file for current Excel version. Pandas uses xlrd to read Excel files. Since current versions of Excel use the .xlsx format reading Excel sheets with pmut i/o fails.

To Reproduce conda install xlrd (Will install v 2.0.1 which does not support .xlsx files) use pmutt to read data from a spreadsheet

Additional context Short term work arrounds:

  1. Save Excel spreadsheets using the Excel 97-2003 Workbook format. This will save in .xls format and should still be readable
  2. Install an older version of xlrd. conda install xlrd=1.2.0 There is a warning that this could introduce a security issue but it will continue to read .xlsx files.
jonlym commented 3 years ago

Looks like Pandas developers suggested to downgrade xlrd.

We can update the setup file to use the last working version.

wittregr commented 3 years ago

It might also be useful to lock the xlrd version to 1.2.0 to avoid accidentally updating it to a newer version. Add a file named "pinned" to your conda-meta folder (Usually in your Anaconda3 folder) with the line:

xlrd ==1.2.0

This will prevent any updates from updating xlrd to a newer version.

hansgilead commented 3 years ago

another option that worked for me is to specify engine='openpyxl' in the pd.read_excel call for .xlsx and later spreadsheets-- this shouldn't be necessary and will add in the complexity of trying to figure out in advance whether the spreadsheet you are trying to open will be .xls or .xlsx but if you're expecting a consistent file type this is another possible workaround until someone fixes pd.read_excel to pick the correct engine based on file extension.

jonlym commented 3 years ago

That's a great suggestion, @hansgilead! Our users will probably only use 'xlsx' so this is a much more elegant solution than forcing users to use a certain version of xlrd.

@wittregr, I'll test this with a couple of our examples and make a new pull request.