EPAENERGYSTAR / epathermostat

Methods for measuring and reporting connected thermostat savings
16 stars 9 forks source link

Potential dependency conflicts between thermostat and pandas #24

Closed NeolithEra closed 3 years ago

NeolithEra commented 4 years ago

Hi, as shown in the following full dependency graph of thermostat, thermostat requires pandas ==0.24.2, thermostat requires *_statsmodels _ (statsmodels 0.11.1 will be installed, i.e., the newest version satisfying the version constraint), and directed dependency statsmodels 0.11.1 transitively introduces pandas >=0.21**.

Obviously, there are multiple version constraints set for pandas in this project. However, according to pip's “first found wins” installation strategy, pandas 0.24.2 (i.e., the newest version satisfying constraint ==0.24.2) is the actually installed version.

Although the first found package version pandas 0.24.2 just satisfies the later dependency constraint (pandas ==0.24.2), such installed version is very close to the upper bound of the version constraint of pandas specified by statsmodels 0.11.1.

Once statsmodels upgrades,its newest version will be installed, as thermostat does not specify the upper bound of version constraint for statsmodels. Therefore, it will easily cause a dependency conflict (build failure), if the upgraded statsmodels version introduces a higher version of pandas, violating its another version constraint ==0.24.2.

According to the release history of statsmodels, it habitually upgrates Pandas in its recent releases. For instance, statsmodels 0.10.0rc1 upgrated Pandas’s constraint from >=0.14 to >=0.20,statsmodels 0.10.0rc2 upgrated Pandas’s constraint from >=0.20 to >=0.19, and statsmodels 0.11.0rc1 upgrated Pandas’s constraint from >=0.19 to >=0.21.

As such, it is a warm warning of a potential dependency conflict issue for thermostat.

Dependency tree

thermostat - 1.7.1
| +- eemeter(install version:2.5.2 version range:==2.5.2)
| | +- click(install version:7.1.1 version range:*)
| | +- pandas(install version:0.24.2 version range:*)
| | +- scipy(install version:1.2.3 version range:*)
| | +- statsmodels(install version:0.11.1 version range:*)
| | | +- numpy(install version:1.18.2 version range:>=1.14)
| | | +- pandas(install version:0.24.2 version range:>=0.21)
| | | +- patsy(install version:0.5.1 version range:>=0.5)
| | | | +- numpy(install version:1.18.2 version range:>=1.4)
| | | | +- six(install version:1.14.0 version range:*)
| | | +- scipy(install version:1.2.3 version range:>=1.0)
| +- eeweather(install version:0.3.20 version range:==0.3.20)
| | +- click(install version:7.1.1 version range:*)
| | +- pandas(install version:0.24.2 version range:*)
| | +- pyproj(install version:1.9.6 version range:==1.9.6)
| | +- requests(install version:2.23.0 version range:*)
| | | +- certifi(install version:2020.4.5.1 version range:>=2017.4.17)
| | | +- chardet(install version:3.0.4 version range:>=3.0.2,<4)
| | | +- idna(install version:2.9 version range:>=2.5,<3)
| | | +- urllib3(install version:1.25.9 version range:>=1.21.1,<1.26)
| | +- shapely(install version:1.7.0 version range:*)
| +- pandas(install version:0.24.2 version range:==0.24.2)
| +- sqlalchemy(install version:1.3.1 version range:==1.3.1)

Thanks for your help. Best, Neolith

NeolithEra commented 4 years ago

Suggested Solution

  1. Loosen the version range of pandas to be >=0.24.2.
  2. Remove your direct dependency pandas, and use the pandas transitively introduced by statsmodels.
  3. Change your direct dependency statsmodels to be <=0.11.1. @philngo Which solution do you prefer, 1 ,2or 3? Please let me know your choice. May I pull a request to solve this issue?
craigmaloney commented 4 years ago

Hi @NeolithEra,

One of the reasons we pin the version of pandas is because we've had instances where a particular version of pandas doesn't behave how we'd expect. (0.21 was one such version, and we are working to support Pandas 1.0).

Our focus at the moment is to have the thermostat package drive its dependencies. That might not be ideal if another package is expecting to drive the dependency chain.

I'm not sure any of these options really works, although 3 is the best-case for preventing the dependency clash that you described. That said we're not directly depending on statsmodel (that appears to come in via eemeter).

My thought is that the pandas version that we specify should be enough of a constraint on the rest of the dependencies that they'll install the correct versions. That said, if thermostat is part of a larger package it might not be able to make that claim.

I would love to be able to have pandas not be a pinned package, but I'm leery of opening that up at the moment. Once we have a few more releases with the 1.0 branch of Pandas we can revisit loosening the restriction on the exact version of Pandas.

Thanks!

craigmaloney commented 3 years ago

Wanted to give an update on this issue.

After spending a year working through dependency issues it seems that pinning versions for Pandas and numpy isn't the ideal solution. I pushed an update to the 2.x branch that updated to eemeter 3.1.0 and let that drive the dependencies. Will keep monitoring this to determine if we start experiencing issues with Pandas and numpy and work from there.

Going to close this for the time-being (the 1.7 branch is in maintenance and further developments will be happening in the 2.x branch.)

Thanks!