RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.17k stars 555 forks source link

The case for removing poetry.lock? #2835

Open ashleysommer opened 3 months ago

ashleysommer commented 3 months ago

We've been seeing increasingly often situations where CI Jobs for PRs are failing because the poetry.lock file is out of sync with the pyproject.toml file, or where there is a hash conflict in the poetry.lock file. These are usually caused by dependabot updating only the pyproject.toml and not the poetry.lock file (though it is supposed to do both), or dependabot generating an invalid poetry.lock file, that happens sometimes.

There are lots of other ways it can get out of sync, including:

Documentation about this on from Poetry:

Committing this file to VC is important because it will cause anyone who sets up the project to use the exact same versions of the dependencies that you are using. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments. Even if you develop alone, in six months when reinstalling the project you can feel confident the dependencies installed are still working even if your dependencies released many new versions since then. (See note below about using the update command.)

And also from the Poetry docs:

For libraries it is not necessary to commit the lock file.

Again this comes down to the old question, is RDFLib an application or a library? If it is an application, we want everyone who contributes to RDFlib to be using exactly the same dependency versions, and producing exactly the same builds, and we need to build RDFLib in a reproducible way.

If RDFlib is a library, then we can simply include a compatibility-range of dependency versions in the pyproject.toml file, and it is up to the end-user's package management tool to install the right versions that work with the final application.

Personally I only ever use RDFLib as a library. All of my applications, even small scripts, have their own dependency list and start with import rdflib. To me that makes RDFLib a library. But I know there are others who use RDFLib only for the built-in cli tools. So to them, RDFLib is an application.

See this Stackoverflow Answer for a well thought out response to this same issue: https://stackoverflow.com/a/61076546

And also this comment from the stackoverflow answer:

I help maintain a number of closed and open source projects, and they all commit lockfiles, partly because I advocated in favor of it. By now I regret that choice, because it occurs quite often that someone's build is not working and the solution is to delete and re-build the lockfile, after which all of us end up having merge conflicts. – Arne Commented Apr 7, 2020 at 12:02

Its now 2024 and it seems people (inlcuding us) are facing the same issue.

This is an argument from that thread in favor of keeping the lockfile:

Poetry's lock file is an universal lock file.

This means that Poetry doesn't care about the current environment, neither the Python version in use, nor the platform. Instead it makes sure that dependencies are resolvable within the given Python version range in pyproject.toml. This results in a lock file that is valid on any platform with a Python version within the range given in the pyproject.toml.

This difference to other tools, that produces lock file, is also the reason why Poetry is slower in resolving dependencies. This is also the reason why it is recommended to check in the poetry.lock in your vcs. Doing so, it speed up setting up your development environment and you make sure your environment is reproducible.

So we need to either a) find a way to always keep everyone using the same lockfile, keep the lockfile up to date in main for every PR including dependabot PRs, and probably even ensure all contributors and all CI environments are using the same version of poetry. or b) remove the lockfile from the repo.

ashleysommer commented 3 months ago

Note, I found this thread on the poetry issue tracker that describes the issue with content-hash, describing why it happens, and potential work-arounds. https://github.com/python-poetry/poetry/issues/496

edmondchuc commented 3 months ago

What benefits do we currently gain from having the lockfile, besides ensuring a successful poetry install after cloning the repo?

Option b) does sound appealing due to less maintenance overhead.

For removing lockfile Even with the application use case of RDFLib, as long as the dependency version constraints are specified in pyproject.toml, they should be supported and guaranteed to work. If there are any runtime issues, then we should update the dependency version constraints or have code paths that handle the dependency breaking changes in the application itself.

Any other considerations for option b)?

nicholascar commented 1 month ago

Keep in 7.x, remove in 8.x after tidy up of pyproject.toml min versions