Snowflake-Labs / schemachange

A Database Change Management tool for Snowflake
Apache License 2.0
481 stars 219 forks source link

Dependency lock for Pandas (and arguably all packages) is too strict #205

Open dwreeves opened 8 months ago

dwreeves commented 8 months ago

TLDR: I believe Pandas should either be refactored out of the library entirely, or the upper bound on the version should be removed. There is also a very strong argument for removing the other upper bounds.

Upper bounds on dependencies are well-intended in practice to provide assistance to users doing standalone installations, but they can also have the opposite effect of making installation in more complex environments a burden.


The setup.cfg has the following dependencies:

install_requires =
    jinja2~=3.0
    pandas~=1.3
    pyyaml~=6.0
    snowflake-connector-python>=2.8,<4.0

All of these strike me as a little too strict, given that only core API behavior is being utilized for each of the packages.

Pandas is the most noteworthy. Pandas is hardly being used in the code in the first place. (In fact, the code could be refactored so that it is not even a dependency.) What little is being used is core API behavior. And, unlike pyyaml and jinja2, there exists a major version update (Pandas 2.x) that has a semantic version which is incompatible with the upper bound of <2. There needn't be any version locking whatsoever for this.


While we are at it:

Overall these upper bounds strike me as overly cautious, especially in the case of Pandas.

StLWallace commented 2 months ago

Came here to post the same issue.

For local testing, we're using a monorepo where we also run Snowpark, which if I'm not mistaken requires pandas 2.2.1. I agree that the pandas version here should be at least upgraded, and ideally removed.