Running is_dirty() on my repo takes 5 minutes because it's a large repo, has text conversion enabled for diffs, and is_dirty() is outputting a full diff. is_dirty() should be a relatively simple operation, but since it uses git diff instead of a plumbing command like git diff-files it incurs the cost of displaying nice output for users.
The diff.astextplain.textconv git option converts pdf files to text before diffing. This option appears to come with msys git. It's useful when diffing interactively, but a lot of overhead when just checking for dirty state.
GitPython doesn't look at the output of the diff, it just checks that it's not empty:
Running
is_dirty()
on my repo takes 5 minutes because it's a large repo, has text conversion enabled for diffs, andis_dirty()
is outputting a full diff.is_dirty()
should be a relatively simple operation, but since it usesgit diff
instead of a plumbing command likegit diff-files
it incurs the cost of displaying nice output for users.The
diff.astextplain.textconv
git option converts pdf files to text before diffing. This option appears to come with msys git. It's useful when diffing interactively, but a lot of overhead when just checking for dirty state.GitPython doesn't look at the output of the diff, it just checks that it's not empty:
https://github.com/gitpython-developers/GitPython/blob/3470fb3e5ff7f77e5bd19bc264163cd31db4a5df/git/repo/base.py#L957-L977
If we switch from the
diff
todiff-index
, we can see that it's comparable in speed to turning off the text conversions:(These timings are all after running these command several times. When I first ran git diff it took 26 minutes!)
Workaround
Add a .gitattributes that disables the text conversion:
Solution
I think is_dirty should instead use
diff-files
anddiff-index
. This answer looks like a good explanation of how they work.Here's my rough replacement for is_dirty:
I'll try to find time to make a proper patch if that sounds good.