Snowflake-Labs / schemachange

A Database Change Management tool for Snowflake
Apache License 2.0
517 stars 228 forks source link

Dependency lock for Pandas (and arguably all packages) is too strict #205

Open dwreeves opened 1 year ago

dwreeves commented 1 year ago

TLDR: I believe Pandas should either be refactored out of the library entirely, or the upper bound on the version should be removed. There is also a very strong argument for removing the other upper bounds.

Upper bounds on dependencies are well-intended in practice to provide assistance to users doing standalone installations, but they can also have the opposite effect of making installation in more complex environments a burden.


The setup.cfg has the following dependencies:

install_requires =
    jinja2~=3.0
    pandas~=1.3
    pyyaml~=6.0
    snowflake-connector-python>=2.8,<4.0

All of these strike me as a little too strict, given that only core API behavior is being utilized for each of the packages.

Pandas is the most noteworthy. Pandas is hardly being used in the code in the first place. (In fact, the code could be refactored so that it is not even a dependency.) What little is being used is core API behavior. And, unlike pyyaml and jinja2, there exists a major version update (Pandas 2.x) that has a semantic version which is incompatible with the upper bound of <2. There needn't be any version locking whatsoever for this.


While we are at it:

Overall these upper bounds strike me as overly cautious, especially in the case of Pandas.

StLWallace commented 7 months ago

Came here to post the same issue.

For local testing, we're using a monorepo where we also run Snowpark, which if I'm not mistaken requires pandas 2.2.1. I agree that the pandas version here should be at least upgraded, and ideally removed.

sfc-gh-tmathew commented 3 months ago

Hello @StLWallace @dwreeves

v3.7.0 has removed pandas dependency. Please check and let us know if your issue is resolved.

dwreeves commented 3 months ago

Thanks @sfc-gh-tmathew! Do you have any thoughts on the additional dependency restrictions I mentioned may be too strict, specifically PyYaml and Jinja2?

sfc-gh-tmathew commented 3 months ago

@dwreeves We have a thread going on about organizing the code to make it easier to test, extend. The idea was to avoid breaking a tested release by pinning the highest version we tested with. Will test later versions of PyYaml and Jinja2 to make the upper limit less concerning.

dwreeves commented 3 months ago

Ok.

My 2 cents, I think it is not generally an issue for stable and mature projects to keep dependencies unpinned, or at most pinned to latest major version. Doubly true if you're just using the high level API, which this project does (it doesn't go deep into any of the functionality of these libraries at all.)

In general, I've seen more issues from strict pinning of deps than no pinning at all.

There is a real cost to users: in this case, I cannot install schemachange into my main environment because I need to be using the latest version of Jinja2, which is 3.1. Some other users may need to use a Jinja2 version < 3.

I've also seen this with other Snowflake python projects. Is this aggressive pinning an org-wide policy at Snowflake? 🤔


If there is another thread discussing a related issue, then you can mark this as complete, but I would hope the related issue addresses this.

As of right now, removing upper bounds for the deps should cause zero issues for users. (The lower bounds could also likely be lower as well.)

By default pip install in CI will always install the latest version that fits the constraint, so just running the CI normally tests the upper bound. You can use github actions matrix strategy or uv pip install --resolution=lowest-direct to test lower bounds for deps.

Thanks for your help with this!

sfc-gh-tmathew commented 3 months ago

As we have added more test case coverage, we will be able to test core functionality with more versions of pyYAML and Jinja2. Will work that into the next release.