Closed EliahKagan closed 1 year ago
Thanks for bringing this to my attention!
I'd be pleased to open pull requests to help with this issue, either following this plan or another one, if that would be helpful.
Yes, I'd very much like your help with this, thank you 🙏.
I'd be glad if that PR could also remove the version from setup.py
if that's non-breaking so the version has to be changed in only one place from then on, and if possible because it's non-breaking.
On further consideration, this might be not be the best time to fix the duplication.
This should not stop a patch version of gitdb from being released, so this is kind of a side topic. But my detailed thinking about why it may be better to fix this issue with a new patch version of gitdb without eliminating its metadata duplication follows.
Although the repetition is explained in terms of expected style, there is one benefit of it that may be worth preserving: keeping the smmap
package from having to be installed just to build gitdb
packages. If setup.py
imports gitdb
to get the metadata, it looks like the indirect imports through gitdb/__init__.py
(both in _init_externals
and in the *
imports at the bottom of the file) require smmap
to be available. I worry that some downstream builds may be depending on that not being required.
This could be fixed, without turning smmap or any future dependencies into build dependencies, by any scheme that determines the version without importing the code. This includes the very ad-hoc approach GitPython takes of substituting a version for a placeholder, which I'd be reluctant to introduce to gitdb and would prefer to find a way for GitPython to stop using too (see below), or a reverse ad-hoc approach where the version is hard-coded in packaged files and read by setup.py
without importing them, which I would also be reluctant to introduce (also see below).
It can also be solved, I think better, with packages that specialize in automatically setting versions from source control tags, such as setuptools_scm
or versioneer
. However, if there are downstream build recipes that don't need smmap because they don't run the tests, they might also be done in way that doesn't have gitdb's .git
directory, and so might fail to get the version.
Unlike the situation in #1713--where a downstream build recipe was broken in a specific way that GitPython intended to guarantee would not break--I actually think breaking downstream builds for this sort of thing is totally defensible.
However, as a separate goal from de-duplicating version information, I think it would be best for GitPython, gitdb, and smmap to replace all their setup.py
logic with a declarative approach, where everything would be defined in pyproject.toml
, or if necessary setup.cfg
; and where setup.py
would either become a short stub script that just calls setup()
with no arguments, or be removed altogether. They should also if possible all do it the same way, so that familiarity with one helps one understand the others, and so it is usually easy to know if a change made to one should be made to the others and, if so, how it should be made.
(Note that if the GitPython repository becomes a monorepo as discussed in https://github.com/gitpython-developers/smmap/pull/53#issuecomment-1762668234, the three projects would still be defined separately. So this would go along with or perhaps even support such a change, rather than clashing with or being an alternative to it.)
For this reason, I am reluctant to make any changes to any of these projects' setup.py
files that would move further from that and make it harder or more complicated (and thus more likely to introduce bugs if and when done). That is, I would prefer to avoid making setup.py
unnecessarily less declarative. In addition, it might turn out that fixing version duplication, so that the version is expressed in either one or (by getting it from tags automatically) zero places in the code, could naturally be done together with making the setup scripts fully declarative.
That sort of change could really be done at any time. But the current (temporary) skew, which this issue is about, between the current versions of GitPython and gitdb with respect to __all__
, is something that would make me uneasy about doing it immediately or as part of the change that fixes this issue. Therefore, I suggest putting it off.
You might prefer I still open pull requests. You might:
dependabot.yml
, in the GitPython and gitdb repositories, from monthly to weekly, per #1702 (comment). This could be done either with or without other pull requests (or as part of them).However, as a separate goal from de-duplicating version information, I think it would be best for GitPython, gitdb, and smmap to replace all their
setup.py
logic with a declarative approach, where everything would be defined inpyproject.toml
, or if necessarysetup.cfg
; and wheresetup.py
would either become a short stub script that just callssetup()
with no arguments, or be removed altogether. They should also if possible all do it the same way, so that familiarity with one helps one understand the others, and so it is usually easy to know if a change made to one should be made to the others and, if so, how it should be made.
Yes, please! Let's not do any changes to the way the packages are defined unless it makes them declarative.
Should I still open PRs?
If I understood correctly, I could create a new patch release of gitdb
and everything should be in good order again
I will do that right away.
Alright, I managed and here is the new gitdb
release: https://github.com/gitpython-developers/gitdb/releases/tag/4.0.11.
On another note, it seems like keeping the version duplicated in __init__
of the package itself isn't anything we'd want anyway. It's a breaking change to remove the field, but I think it would be alright to not update it anymore and maybe even make it emit a deprecation notice on access, if that's possible at all. Without that duplication, having a declarative package should be fine without adding any need for complications.
If I understood correctly, I could create a new patch release of
gitdb
and everything should be in good order again
Arguably GitPython's declared dependency on gitdb should have its minimum version raised to the new version. However, I am not really sure that should be done, and if it is to be done, it might be better to wait for feedback from @DeflateAwning (author of #1659 and gitdb#98) on whether things are working (or at least nothing new is broken) when using pyright
with the latest versions of both packages.
I will do that right away.
Yes, whether or not anything further has to be changed in GitPython in connection with this, making that change in gitdb now should work fine.
Everything is good! Main reported issues are fixed. Ideal siuation would be a release on gitdb, bump the submodule version, and release gitpython!
Furthermore, nothing new is broken, as far as I can tell. Someone could double check, but it's probably fine
Everything is good! Main reported issues are fixed. Ideal siuation would be a release on gitdb, bump the submodule version, and release gitpython!
The gitdb
release is done, and somehow I hope that I can avoid another near-noop GitPyhon release and let it wait until there is something substantial. Ideally, the submodule state won't matter at all for packaging GitPython
, so maybe that's something to work on as well.
It seems to me that this issue can be considered fixed.
On another note, it seems like keeping the version duplicated in
__init__
of the package itself isn't anything we'd want anyway. It's a breaking change to remove the field, but I think it would be alright to not update it anymore and maybe even make it emit a deprecation notice on access, if that's possible at all. Without that duplication, having a declarative package should be fine without adding any need for complications.
A DeprecationWarning
could be be issued when the __version__
attribute is accessed. Here's an example of one way to do that.
However, whether or not that is done, I recommend against having a top-level __version__
attribute with a version that is intentionally older than the actual current version, because:
__version__
existing with a wrong value, exacerbated by how DeprecationWarning
is rarely actually seen except when a developer is deliberately looking for it.I think replacing all the non-stub setup.py
logic with a declarative way is worthwhile in all three projects (GitPython, gitdb, and smmap), because even if they all end up moving into the GitPython repository in the future, it sounds like they'll remain separate projects that therefore need to be defined separately.
__version__
existing but giving an incorrect version__version__
, they're often debugging something else, and the possibility that __version__
exists but is not current is unlikely to be considered. Users may go on a wild goose chase trying to figure out why they don't seem to be able to get the latest version installed, may suspect a problem with sys.path
, etc. Note that this does not make values like "git"
harmful, just old or otherwise wrong version numbers.DeprecationWarning
and PendingDeprecationWarning
are not shown by default except when issued directly from the running program's entry-point module. This does not make them useless, but it does make it so that people who examine __version__
when trying to figure out why something isn't working won't usually be helped by them.__version__
is sort of unclear, since I believe its meaning is not directly documented anywhere; instead, its use is sort of officially smiled on by numerous official references to it, examples of it, and guidance about things to do when using it. However, having __version__
exist and give something unambiguously considered not to be correct version information probably, at minimum, constitutes a more severe misuse of a dunder than having __version__
exist and give a correct version.We can eliminate the duplication as part of the same change as defining the projects declaratively, and do so without turning a project's runtime dependencies into build dependencies. I suggest this be done at the same time, or around the same time, for GitPython, gitdb, and smmap, and in the same or similar ways for all of them.
I will assume that, at least in the immediate future, it is intended--for any of these projects in which such a change is made--that its version exist in exactly one place in the repository, and thus not be duplicated and also not be computed from tags. (Also, that the GitPython repository might become a monorepo is a reason to avoid getting versions from tags, which I hadn't thought of before.)
Three options, any of which I would advocate as reasonable, are:
pyproject.toml
or setup.cfg
, it is static metadata that can itself be inspected by tooling that can parse TOML or INI (or, if some brittleness is acceptable, as in check-version.sh
, by grep
). So it should not need to be repeated anywhere else. The __version__
attribute in __init__.py
can then be populated--or computed on access--using importlib.metadata.version
(which is what the example linked above does). To preserve support for Python 3,7, a dependency on the importlib_metadata
backport can be added just for 3.7, which I think is no problem.VERSION
file, this can still be used by a declaratively defined project. In pyproject.toml
, the dynamic metadata feature with the file
directive can be used to read a VERSION
file. With setup.cfg
, the file:
directive can be used with the same effect. Then __init__.py
can obtain the version with importlib.metadata
the same as if it were written literally in the pyproject.toml
or setup.cfg
, as described above. Alternatively, if the VERSION
file is also included in the distribution as package data, then there is the further option for __init__.py
to use importlib.resources
to read the VERSION
file.__init__.py
, but the package is declaratively defined, then so long as the version is assigned literally, pyproject.toml
or setup.cfg
can cause it to be read without actually importing anything. In pyproject.toml
, dynamic metadata with the attr
directive will do this. With setup.cfg
, the attr:
directive would be used. Compare this to the file
/file:
directive described above. The attr
/attr:
directive first tries to read an assignment of an ast.literal_eval
able expression to the name, and only imports the module if it can't find it that way--but it is simple to ensure that always succeeds. This works even if actually importing the module would fail, for example due to a missing dependency. (My understanding is that this feature is not available when defining a project in setup.py
, but only when defining it declaratively in pyproject.toml
or setup.cfg
.)However this is done, I think the main benefit is actually to GitPython rather than gitdb (or smmap), because all the custom version-stamping logic in setup.py
could be eliminated. Since that logic was the main hurdle for converting GitPython to use a declarative package definition, this seems like a win. But of course it would also allow the version to be deduplicated in gitdb. I recommend the same approach be used for all three packages unless a strong reason arises not to, Which approach should be taken, I am not sure, which is one reason I'm writing this instead of opening pull requests.
If either of the first two ways are done, then accessing __version__
from __init__.py
could either issue a DeprecationWarning
or not. If the third way is done, then it should not (if accessing __version__
in the top-level module is deprecated, then that's not how building should access it either). If the idea of doing it the same way in all three projects is followed, then I think that includes issuing the DeprecationWarning
either in all three projects or none of them. In that case, the __version__
attribute should only be deprecated if its documented role for GitPython bug reports is somehow superseded by some other recommendation.
conf.py
reads the VERSION
file, while in gitdb and in smmap conf.py
seem to hard-code very old versions. (But probably the approach for managing the version in general should not be chosen based on Sphinx; instead, conf.py
can be modified accordingly.)Thanks again for this fantastic analysis!
I was suggesting deprecation primarily due to my ignorance about the available options, and going with…
- If the version is given in
pyproject.toml
orsetup.cfg
[..]
…seems like the way to go. I assume that the importlib
dependency can only be imported when the version is actually accessed, so startup time shouldn't be affected by something that might rarely be used if at all.
Sticking with setuptools
seems alright as long as it's the most supported one - with the switch to its declarative version I'd expect all ugliness around it to mostly go away.
Regarding priority, I see how native windows CI support and declarative setuptools can be independent, but I also see why native windows support will help with testing and validating any future change. As always, this choice lies solely with you :).
To support #1659 here in GitPython, gitdb received #98, and the GitPython repository's submodule reference to gitdb was updated in #1704 (and #1705). Thus #1656 might be1 fixed when developing against the cloned GitPython repository with its gitdb submodule.
With the intention of fixing #1656 in the far more common case of using GitPython via PyPI packages, GitPython 3.1.38 was released. But no new release of gitdb was ever made. Since the fix depends on changes to gitdb, it seems unlikely that 3.1.38 really fixes #1656.
I think it might be possible to make further refinements to
__all__
and surrounding code, slightly enhancing #1659. However, since 3.1.38 was already released, it seems to me that such things can wait, and that it's reasonable to make a gitdb release (bumping the version in itssetup.py
andgitdb/__init__.py
, or changing the current approach and bumping it in just one place), and to include a dependency bump requiring it the next GitPython release (which could be the same release that ships #1715, if the timing works out).I'd be pleased to open pull requests to help with this issue, either following this plan or another one, if that would be helpful.
1 I had originally said it was likely fixed when developing against the cloned GitPython. However, that is probably not the case. First, most type checkers probably wouldn't understand how that version is added to
sys.path
, and it is probably good design of them not to (since they are static checkers). Second, as documented in #1717, it turns out this is never actually happening.