Closed EliahKagan closed 10 months ago
Thanks--if I contemplate significant changes to, or related to, gitdb, I'll being by looking into whether I can remove GitPython's dependence on it or vendor the specific parts that GitPython needs.
In this case, I just noticed that I was aware of a way to fix the race condition that might be acceptable. so I figured I'd open a PR. Although there are some other changes I may want to propose to the CI workflow in this repository, I'm not sure I really know enough to fix the most important gitdb-related issues, in GitPython or in gitdb itself. As my familiarity with the GitPython codebase increases, that might change.
Although there are some other changes I may want to propose to the CI workflow in this repository, I'm not sure I really know enough to fix the most important gitdb-related issues, in GitPython or in gitdb itself. As my familiarity with the GitPython codebase increases, that might change.
I would absolutely love if I could direct your incredible skill and energy away from the GitDB CI and towards making it unnecessary entirely :). It's probably more of a refactoring task, albeit a complex one.
As my familiarity with the GitPython codebase increases, that might change.
It's a great gift to have you contribute to this project, and I love the idea to have more of that. The project, to my mind, has significant issues with quality and some limitations stem from these. The greatest problems come from incorrect handling of encodings, both for paths and for data, along with many naive implementations of some git
data-structures (see Index
).
With over half a billion downloads per year GitPython is more influential than I feel comfortable with given its current state, and your work will be majorly impactful.
I also hope your contribution experience is rewarding and pleasant so you can continue this awesome work.
Thanks! Based on this, the only further gitdb CI pull request I will open now is one to re-add Python 3.7 support, as you indicated should be done in a few of the comments in https://github.com/gitpython-developers/GitPython/pull/1654. (That will include CI, but also changing the lower bound back to 3.7 in setup.py
. It's very simple, because it's just a revert of one commit.)
Fixes #85
This replaces the conditional call to
os.mkdir
that raises an unintendedFileExistsError
if the directory is created between the check and theos.mkdir
call, using a singleos.makedirs
call instead, withexist_ok=True
.This way, we attempt creation in a way that produces no error if the directory is already present, while still raising
FileExistsError
if a non-directory filesystem entry (such as a regular file) is present where we want the directory to be. This is the advantage of this approach over the approach of swallowingFileExistError
as suggested in #85.Note, however, that
os.makedirs
behaves likemkdir -p
: it attempts to create parent directories (and their parents, etc.) if they do not already exist. So it should only be used if that is acceptable in this case. I am not aware of a reason it wouldn't be, but I am not very familiar with gitdb.So that aspect of the situation deserves special consideration in reviewing this PR. I'd be pleased to change the approach if
os.makdirs
is judged not suitable here. I think the approach suggested in #85 is reasonable, and it can be made more robust by checking that the directory exists after the creation attempt (or in other ways).The code was under test: that line is exercised in
TestExamples.test_base
,TestGitDB.test_writing
,TestLooseDB.test_basics
, andTestObjDBPerformance.test_large_data_streaming
. However, no test catches the race condition this fixes, and I have not added one.Testing that the race condition does not occur in the specific way as before by accessing and calling the same functions as before in the same order would be easy, but it would be more of an illusion of a regression test than a useful test. Testing by trying to brute-force a race condition, without modifying the operation of the code for the test, would work but the tests would take a very long time to run. Testing it in a way that is fairly robust against new ways of reintroducing the race condition and that is not too slow should be possible, but I don't know of a good way to do it; everything I've thought of would be complicated, and possibly make running the test in a debugger like
pdb
infeasible. So I have not added a regression test for this bug. However, if it is considered important to have one, then I can consider the matter further.