CalebBell / chemicals

chemicals: Chemical database of Chemical Engineering Design Library (ChEDL)
MIT License
191 stars 40 forks source link

Pandas' new high-precision csv float parser is sometimes quite low precision #24

Closed CalebBell closed 3 years ago

CalebBell commented 3 years ago

Describe the bug I was wondering why the CI started failing, and it turns out Pandas 1.2.0 updated some defaults for their CSV parser. Well, one of those was to use a higher-precision floating point converter. Chemicals reveals at least one bug in the new parser.

Minimal Reproducible Example

chemicals.viscosity.mu_data_VDI_PPDS_8['D']

In Pandas 1.1.2 when reading "0.00000000000001953" we get: 1.953E-14

In Pandas 1.2.1 we get: 1.95E-14

Additional context This also breaks results in people using data data source from this library.

Workaround It is possible to set the old behavior with float_precision='legacy'. The two data files with this bug have had this default set to this in master now. Ideally, Pandas will fix their bug. I didn't find any issue reported with this in a cursory search.

CalebBell commented 3 years ago

I release 1.0.0 with this fix and have now submitted a bug to pandas: https://github.com/pandas-dev/pandas/issues/39514

Almost 40,000 bugs? I don't envy working on that project!