Aiven-Open / pghoard

PostgreSQL® backup and restore service
http://aiven-open.github.io/pghoard/
Apache License 2.0
1.31k stars 96 forks source link

Freeze python requirements #585

Closed nicois closed 1 year ago

nicois commented 1 year ago

Retain the "loose" python versioning in a subdirectory, but resolve themto specific versions. This prevents CI regressions due to newer upstream versions.

The upgrade-requirements script should be run periodically to bump the requirements, to avoid falling too far behind the current versions.

codecov[bot] commented 1 year ago

Codecov Report

Merging #585 (aef38ce) into main (3448a1b) will decrease coverage by 0.09%. The diff coverage is n/a.

Additional details and impacted files [![Impacted file tree graph](https://codecov.io/gh/aiven/pghoard/pull/585/graphs/tree.svg?width=650&height=150&src=pr&token=nLr7M7hvCx&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiven)](https://codecov.io/gh/aiven/pghoard/pull/585?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiven) ```diff @@ Coverage Diff @@ ## main #585 +/- ## ========================================== - Coverage 91.31% 91.22% -0.09% ========================================== Files 32 32 Lines 4675 4675 ========================================== - Hits 4269 4265 -4 - Misses 406 410 +4 ``` [see 2 files with indirect coverage changes](https://codecov.io/gh/aiven/pghoard/pull/585/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiven)
alanfranz commented 1 year ago

@nicois CI is failing, but do we have a strategy about this? I lost track of evolutions in python packaging, but once upon a time wouldn't you define abstract deps in setup.py (i.e. without exact versions), and pin exact versions in requirements.txt? constraints file doesn't look the right way to do it here, we're just trying to "override" the fact that our requirements.txt doesn't contain versions?

kmichel-aiven commented 1 year ago

@nicois CI is failing, but do we have a strategy about this? I lost track of evolutions in python packaging, but once upon a time wouldn't you define abstract deps in setup.py (i.e. without exact versions), and pin exact versions in requirements.txt? constraints file doesn't look the right way to do it here, we're just trying to "override" the fact that our requirements.txt doesn't contain versions?

constraints.txt is the documented way in pip to pin versions. It specifically says "IF you install the dependency, THEN use this version", unlike requirements.txt which says "install this dependency".

requirements.txt (or setup.py/setup.cfg, which are often more suitable for a library), can specify full versions, but you rarely want that. If you do that, then your app/library has ~zero chances of being installed at the same time as others libraries that would use the same strategy: the chances that you and all your dependencies and dependees pin on the exact same version of every common dependency is ~zero.

You only put meaningful ranges in requirements.txt/setup.py/cfg: "I need version > 3.2 because the older one doesn't have that API I need".

And then you use constraints.txt to pin versionversions for reproducible CI. (You don't need to ship versions.txt to parent projects, they will do their own constraints.txt if they are also interested in reproducible builds)

alanfranz commented 1 year ago

Sorry, i clicked "edit" instead of "quote reply". I hope your comment was restored, Kevin.

Full disclosure: I wasted so much time in my life with python packaging that I don't want to enter an argument about it. But I'd like a reference for whatever is "the way we do it", it can be internal, it can be external, whatever. But if we add our opinions to Python packaging mess, we're done :-)

constraints.txt is the documented way in pip to pin versions. It specifically says "IF you install the dependency, THEN use this version", unlike requirements.txt which says "install this dependency".

I'd say it's a documented way, rather then the.

requirements.txt (or setup.py/setup.cfg, which are often more suitable for a library), can specify full versions, but you rarely want that. If you do that, then your app/library has ~zero chances of being installed at the same time as others libraries that would use the same strategy: the chances that you and all your dependencies and dependees pin on the exact same version of every common dependency is ~zero.

requirements.txt is for manual use, like pip install -r requirements.txt. It's not distributed in a sdist, and when pip does dependency resolution, it uses the dependencies from setup.py/setup.cfg/pyproject. We're using requirements.txt just for CI purpose, not for everything else - when used in aiven-core, we'll leverage the versions that we have there.

If you pass direct dependencies in setup.py->install_requires, then freeze in requirements.txt for CI purposes... what doesn't work?

You only put meaningful ranges in requirements.txt/setup.py/cfg: "I need version > 3.2 because the older one doesn't have that API I need". And then you use constraints.txt to pin versions for reproducible CI. (You don't need to ship versions.txt to parent projects, they will do their own constraints.txt if they are also interested in reproducible builds).

I don't see the difference between that and just having a single requirements.txt with all the pinned versions. It seems one more piece to the puzzle to me.

kmichel-aiven commented 1 year ago

Sorry, i clicked "edit" instead of "quote reply". I hope your comment was restored, Kevin.

Full disclosure: I wasted so much time in my life with python packaging that I don't want to enter an argument about it. But I'd like a reference for whatever is "the way we do it", it can be internal, it can be external, whatever. But if we add our opinions to Python packaging mess, we're done :-)

Personal confession: I don't remember having any problem... and I don't understand why and it's confusing me. There are so many people consistently complaining about packaging that I know that something is wrong with python packaging, we have councils, alternative implementations, etc... I just have no idea what I'm doing or not doing to be so lucky. (Except for the part where I never had to do it on Windows, it seems things are more complicated there.)

constraints.txt is the documented way in pip to pin versions. It specifically says "IF you install the dependency, THEN use this version", unlike requirements.txt which says "install this dependency".

I'd say it's a documented way, rather then the.

Ah yes, I meant it's not something custom or a hack, I didn't mean "it's the only way to do it" :)

requirements.txt (or setup.py/setup.cfg, which are often more suitable for a library), can specify full versions, but you rarely want that. If you do that, then your app/library has ~zero chances of being installed at the same time as others libraries that would use the same strategy: the chances that you and all your dependencies and dependees pin on the exact same version of every common dependency is ~zero.

requirements.txt is for manual use, like pip install -r requirements.txt. It's not distributed in a sdist, and when pip does dependency resolution, it uses the dependencies from setup.py/setup.cfg/pyproject. We're using requirements.txt just for CI purpose, not for everything else - when used in aiven-core, we'll leverage the versions that we have there.

If you pass direct dependencies in setup.py->install_requires, then freeze in requirements.txt for CI purposes... what doesn't work?

You only put meaningful ranges in requirements.txt/setup.py/cfg: "I need version > 3.2 because the older one doesn't have that API I need". And then you use constraints.txt to pin versions for reproducible CI. (You don't need to ship versions.txt to parent projects, they will do their own constraints.txt if they are also interested in reproducible builds).

I don't see the difference between that and just having a single requirements.txt with all the pinned versions. It seems one more piece to the puzzle to me.

The only really bad case is when you only have requirements.txt and no setup.py/cfg, in that case if you start pinning all dependencies, you quickly have no idea what is still necessary in your file: you update X, which doesn't depend on Y anymore, but Y is in your requirements.txt file so it will be installed forever. The only way to clean up is to remove random lines, create an env, run the code, pray you have no lazy imports, and repeat for each possibly-outdated dependency.

As soon as you have two files, things get much better: you have your direct dependencies with meaningful constraints (ideally it's setup.py/cfg) for proper packaging, and another file for CI/dev with all the extra dependencies for testing (this would be requirements.txt).

You could also put your indirect dependencies and pinned versions in that same requirements.txt file. But now you have the same problem as the really bad case, but only for the test dependencies: it becomes really hard to cleanup indirect dependencies of testing tools, and you can't just install what's in setup.py and pip freeze, the test dependencies are missing.

Even without that maintenance issue, on a large project the file quickly becomes unreadable because of the volume indirect dependencies. Then you start to add sections with comments on top to explain why things are there, and desperately try to remember and explain who depends on what, which further prevents you from doing the automated venv/freeze loop.

That's where constraints.txt does exactly what you want :

And you can still remove direct dependencies from setup.py/cfg or requirements.txt, do the venv/freeze loop to refresh the pinned list, and not carry over old dependencies.

As a bonus, constraints.txt does not force-install all conditional dependencies. It does not matter a lot for user-controlled optional dependencies, since the file is for CI/devs and you often want all the optional bits, to be able to test them. It matters more for conditional dependencies depending on the environment: things like importlib-metadata; python_version<"3.8" in setup.cfg. If you want to pin the version for that package, you can't just put it in requirements.txt, this would also install it in recent pythons. (You could have multiple requirements.txt and a bash script that picks the right one depending on the combination of parameters, but... no).

nicois commented 1 year ago

Either way we do it, we need to support 3 situations: