gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.
http://gitpython.readthedocs.org
BSD 3-Clause "New" or "Revised" License
4.62k stars 907 forks source link

3.1.38 meant to fix #1656 but there is no supporting gitdb release #1716

Closed EliahKagan closed 1 year ago

EliahKagan commented 1 year ago

To support #1659 here in GitPython, gitdb received #98, and the GitPython repository's submodule reference to gitdb was updated in #1704 (and #1705). Thus #1656 might be1 fixed when developing against the cloned GitPython repository with its gitdb submodule.

With the intention of fixing #1656 in the far more common case of using GitPython via PyPI packages, GitPython 3.1.38 was released. But no new release of gitdb was ever made. Since the fix depends on changes to gitdb, it seems unlikely that 3.1.38 really fixes #1656.

I think it might be possible to make further refinements to __all__ and surrounding code, slightly enhancing #1659. However, since 3.1.38 was already released, it seems to me that such things can wait, and that it's reasonable to make a gitdb release (bumping the version in its setup.py and gitdb/__init__.py, or changing the current approach and bumping it in just one place), and to include a dependency bump requiring it the next GitPython release (which could be the same release that ships #1715, if the timing works out).

I'd be pleased to open pull requests to help with this issue, either following this plan or another one, if that would be helpful.


1 I had originally said it was likely fixed when developing against the cloned GitPython. However, that is probably not the case. First, most type checkers probably wouldn't understand how that version is added to sys.path, and it is probably good design of them not to (since they are static checkers). Second, as documented in #1717, it turns out this is never actually happening.

Byron commented 1 year ago

Thanks for bringing this to my attention!

I'd be pleased to open pull requests to help with this issue, either following this plan or another one, if that would be helpful.

Yes, I'd very much like your help with this, thank you 🙏. I'd be glad if that PR could also remove the version from setup.py if that's non-breaking so the version has to be changed in only one place from then on, and if possible because it's non-breaking.

EliahKagan commented 1 year ago

On further consideration, this might be not be the best time to fix the duplication.

This should not stop a patch version of gitdb from being released, so this is kind of a side topic. But my detailed thinking about why it may be better to fix this issue with a new patch version of gitdb without eliminating its metadata duplication follows.

Immediate options, and their drawbacks

Although the repetition is explained in terms of expected style, there is one benefit of it that may be worth preserving: keeping the smmap package from having to be installed just to build gitdb packages. If setup.py imports gitdb to get the metadata, it looks like the indirect imports through gitdb/__init__.py (both in _init_externals and in the * imports at the bottom of the file) require smmap to be available. I worry that some downstream builds may be depending on that not being required.

This could be fixed, without turning smmap or any future dependencies into build dependencies, by any scheme that determines the version without importing the code. This includes the very ad-hoc approach GitPython takes of substituting a version for a placeholder, which I'd be reluctant to introduce to gitdb and would prefer to find a way for GitPython to stop using too (see below), or a reverse ad-hoc approach where the version is hard-coded in packaged files and read by setup.py without importing them, which I would also be reluctant to introduce (also see below).

It can also be solved, I think better, with packages that specialize in automatically setting versions from source control tags, such as setuptools_scm or versioneer. However, if there are downstream build recipes that don't need smmap because they don't run the tests, they might also be done in way that doesn't have gitdb's .git directory, and so might fail to get the version.

Slightly longer-term idea

Unlike the situation in #1713--where a downstream build recipe was broken in a specific way that GitPython intended to guarantee would not break--I actually think breaking downstream builds for this sort of thing is totally defensible.

However, as a separate goal from de-duplicating version information, I think it would be best for GitPython, gitdb, and smmap to replace all their setup.py logic with a declarative approach, where everything would be defined in pyproject.toml, or if necessary setup.cfg; and where setup.py would either become a short stub script that just calls setup() with no arguments, or be removed altogether. They should also if possible all do it the same way, so that familiarity with one helps one understand the others, and so it is usually easy to know if a change made to one should be made to the others and, if so, how it should be made.

(Note that if the GitPython repository becomes a monorepo as discussed in https://github.com/gitpython-developers/smmap/pull/53#issuecomment-1762668234, the three projects would still be defined separately. So this would go along with or perhaps even support such a change, rather than clashing with or being an alternative to it.)

For this reason, I am reluctant to make any changes to any of these projects' setup.py files that would move further from that and make it harder or more complicated (and thus more likely to introduce bugs if and when done). That is, I would prefer to avoid making setup.py unnecessarily less declarative. In addition, it might turn out that fixing version duplication, so that the version is expressed in either one or (by getting it from tags automatically) zero places in the code, could naturally be done together with making the setup scripts fully declarative.

That sort of change could really be done at any time. But the current (temporary) skew, which this issue is about, between the current versions of GitPython and gitdb with respect to __all__, is something that would make me uneasy about doing it immediately or as part of the change that fixes this issue. Therefore, I suggest putting it off.

Should I still open PRs?

You might prefer I still open pull requests. You might:

Byron commented 1 year ago

However, as a separate goal from de-duplicating version information, I think it would be best for GitPython, gitdb, and smmap to replace all their setup.py logic with a declarative approach, where everything would be defined in pyproject.toml, or if necessary setup.cfg; and where setup.py would either become a short stub script that just calls setup() with no arguments, or be removed altogether. They should also if possible all do it the same way, so that familiarity with one helps one understand the others, and so it is usually easy to know if a change made to one should be made to the others and, if so, how it should be made.

Yes, please! Let's not do any changes to the way the packages are defined unless it makes them declarative.

Should I still open PRs?

If I understood correctly, I could create a new patch release of gitdb and everything should be in good order again I will do that right away.

Byron commented 1 year ago

Alright, I managed and here is the new gitdb release: https://github.com/gitpython-developers/gitdb/releases/tag/4.0.11.

On another note, it seems like keeping the version duplicated in __init__ of the package itself isn't anything we'd want anyway. It's a breaking change to remove the field, but I think it would be alright to not update it anymore and maybe even make it emit a deprecation notice on access, if that's possible at all. Without that duplication, having a declarative package should be fine without adding any need for complications.

EliahKagan commented 1 year ago

If I understood correctly, I could create a new patch release of gitdb and everything should be in good order again

Arguably GitPython's declared dependency on gitdb should have its minimum version raised to the new version. However, I am not really sure that should be done, and if it is to be done, it might be better to wait for feedback from @DeflateAwning (author of #1659 and gitdb#98) on whether things are working (or at least nothing new is broken) when using pyright with the latest versions of both packages.

I will do that right away.

Yes, whether or not anything further has to be changed in GitPython in connection with this, making that change in gitdb now should work fine.

DeflateAwning commented 1 year ago

Everything is good! Main reported issues are fixed. Ideal siuation would be a release on gitdb, bump the submodule version, and release gitpython!

DeflateAwning commented 1 year ago

Furthermore, nothing new is broken, as far as I can tell. Someone could double check, but it's probably fine

Byron commented 1 year ago

Everything is good! Main reported issues are fixed. Ideal siuation would be a release on gitdb, bump the submodule version, and release gitpython!

The gitdb release is done, and somehow I hope that I can avoid another near-noop GitPyhon release and let it wait until there is something substantial. Ideally, the submodule state won't matter at all for packaging GitPython, so maybe that's something to work on as well.

EliahKagan commented 1 year ago

It seems to me that this issue can be considered fixed.

EliahKagan commented 1 year ago

On another note, it seems like keeping the version duplicated in __init__ of the package itself isn't anything we'd want anyway. It's a breaking change to remove the field, but I think it would be alright to not update it anymore and maybe even make it emit a deprecation notice on access, if that's possible at all. Without that duplication, having a declarative package should be fine without adding any need for complications.

A DeprecationWarning could be be issued when the __version__ attribute is accessed. Here's an example of one way to do that.

However, whether or not that is done, I recommend against having a top-level __version__ attribute with a version that is intentionally older than the actual current version, because:

I think replacing all the non-stub setup.py logic with a declarative way is worthwhile in all three projects (GitPython, gitdb, and smmap), because even if they all end up moving into the GitPython repository in the future, it sounds like they'll remain separate projects that therefore need to be defined separately.

Disadvantages of __version__ existing but giving an incorrect version

The good news: we shouldn't need to do that to eliminate duplication

We can eliminate the duplication as part of the same change as defining the projects declaratively, and do so without turning a project's runtime dependencies into build dependencies. I suggest this be done at the same time, or around the same time, for GitPython, gitdb, and smmap, and in the same or similar ways for all of them.

I will assume that, at least in the immediate future, it is intended--for any of these projects in which such a change is made--that its version exist in exactly one place in the repository, and thus not be duplicated and also not be computed from tags. (Also, that the GitPython repository might become a monorepo is a reason to avoid getting versions from tags, which I hadn't thought of before.)

Three options, any of which I would advocate as reasonable, are:

However this is done, I think the main benefit is actually to GitPython rather than gitdb (or smmap), because all the custom version-stamping logic in setup.py could be eliminated. Since that logic was the main hurdle for converting GitPython to use a declarative package definition, this seems like a win. But of course it would also allow the version to be deduplicated in gitdb. I recommend the same approach be used for all three packages unless a strong reason arises not to, Which approach should be taken, I am not sure, which is one reason I'm writing this instead of opening pull requests.

If either of the first two ways are done, then accessing __version__ from __init__.py could either issue a DeprecationWarning or not. If the third way is done, then it should not (if accessing __version__ in the top-level module is deprecated, then that's not how building should access it either). If the idea of doing it the same way in all three projects is followed, then I think that includes issuing the DeprecationWarning either in all three projects or none of them. In that case, the __version__ attribute should only be deprecated if its documented role for GitPython bug reports is somehow superseded by some other recommendation.

A few caveats

Especially relevant documentation

Byron commented 1 year ago

Thanks again for this fantastic analysis!

I was suggesting deprecation primarily due to my ignorance about the available options, and going with…

  • If the version is given in pyproject.toml or setup.cfg [..]

…seems like the way to go. I assume that the importlib dependency can only be imported when the version is actually accessed, so startup time shouldn't be affected by something that might rarely be used if at all.

Sticking with setuptools seems alright as long as it's the most supported one - with the switch to its declarative version I'd expect all ugliness around it to mostly go away.

Regarding priority, I see how native windows CI support and declarative setuptools can be independent, but I also see why native windows support will help with testing and validating any future change. As always, this choice lies solely with you :).