Based on the current library installation flow, we can end up in a situation where we don't test with the most recent versions of libraries installed, if additional restrictions are present in the dask or spark extras that are not part of the core requirements.
Take this run of the latest dependency checker for example. This was triggered by a new version of pandas (2.0.3).
However, if you follow the installation process in the logs you see that during the Install woodwork - requirements step, pandas 2.0.3 and numpy 1.25.1 both get installed. Further down, we run Install Dask and Spark. Because the Spark requirements have an upper bound restriction on both pandas and numpy, we end up downgrading versions for these libraries to 1.5.3 and 1.23.5, respectively, before the tests are started.
The end result of this is that even though this PR suggests that all the tests pass with the latest version of pandas, we didn't actually run the CI with the newest version that triggered the PR. This should be fixed so that our tests contain a run with only the core requirements (no dask and spark extras) to make sure that the latest version of things specified in the core requirements are actually run.
Based on the current library installation flow, we can end up in a situation where we don't test with the most recent versions of libraries installed, if additional restrictions are present in the
dask
orspark
extras that are not part of the core requirements.Take this run of the latest dependency checker for example. This was triggered by a new version of pandas (2.0.3).
However, if you follow the installation process in the logs you see that during the
Install woodwork - requirements
step, pandas 2.0.3 and numpy 1.25.1 both get installed. Further down, we runInstall Dask and Spark
. Because the Spark requirements have an upper bound restriction on both pandas and numpy, we end up downgrading versions for these libraries to 1.5.3 and 1.23.5, respectively, before the tests are started.The end result of this is that even though this PR suggests that all the tests pass with the latest version of pandas, we didn't actually run the CI with the newest version that triggered the PR. This should be fixed so that our tests contain a run with only the core requirements (no dask and spark extras) to make sure that the latest version of things specified in the core requirements are actually run.