gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.
http://gitpython.readthedocs.org
BSD 3-Clause "New" or "Revised" License
4.66k stars 906 forks source link

Using GitPython inside a pre-commit hook breaks (GIT_AUTHOR_DATE breaks) #963

Open dries007 opened 5 years ago

dries007 commented 5 years ago

I suspect the culprit of the crash is the GIT_AUTHOR_DATE. If I remove this before starting my script it works. I already pass expand_vars=False, but that isn't passed far enough down to make a differance.

My current workaround is to nulke the enviroment variables when invoking my script by using env -i <script> from the commit hook.

My ideal solution would be a flag to disable the copy of the os environ on git/cmd.py:699. This will not only eliminate this issue, but also any potential other interferance (think GIT_DIR and friends).

Replicate via:

export GIT_INDEX_FILE='.git/index'
export GIT_AUTHOR_DATE='@1574863356 +0100'
export GIT_EXEC_PATH='/usr/lib/git-core'
export GIT_AUTHOR_EMAIL='admin@dries007.net'
export GIT_PREFIX=''
export GIT_AUTHOR_NAME='Dries'

python3
import tempfile

import git

def main():
    with tempfile.TemporaryDirectory() as folder:
        repo = git.Repo.init(folder, expand_vars=False)
        repo.index.commit("Init test repo with empty commit.")
        repo.head.reference = repo.create_head("master")

if __name__ == "__main__":
    main()

Thanks, Dries

Byron commented 4 years ago

Thanks for providing what’s needed to reproduce this issue, even though I didn’t thanks to writing this on an iPad. Something that would be nice to add to the issue is the stack trace that follows from executing the example.

Something I wonder is if you already verified that not copying the environment in https://github.com/gitpython-developers/GitPython/blob/master/git/cmd.py#L699 will fix the issue. If I recall correctly, child processes always inherit from their parent process environment, and the copy is meant to allow idempotent changes to the env dict.

dries007 commented 4 years ago

Just for your info: if you use the env kwarg in Popen, the parent's env is not cloned, but you're right, it doesn't actually fix the issue.

The after a bit more searching the root cause seems to be that GitPython tries to read a lot of enviroment variables when doing things like committing. In this case objects/commit.py#L345 and L354.

It's good that this crash happened or GitPython would have silently used the env variables which is something we don't want in this case. That's why I would appreciate the ability to disable that completely, for example with a kwarg to Repo's __init__ and init (I was thinking isolated). I woudln't mind trying to create a PR for this if it would be accepted.

Now for the GIT_AUTHOR_DATE parsing bug:

This is the full traceback:

Traceback (most recent call last):
  File "/home/dries/tmp/.venv/lib/python3.8/site-packages/git/objects/util.py", line 151, in parse_date
    timestamp = int(timestamp)
ValueError: invalid literal for int() with base 10: '@1574863356'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dries/tmp/GitPythonIssue.py", line 14, in <module>
    main()
  File "/home/dries/tmp/GitPythonIssue.py", line 9, in main
    repo.index.commit("Init test repo with empty commit.")
  File "/home/dries/tmp/.venv/lib/python3.8/site-packages/git/index/base.py", line 955, in commit
    rval = Commit.create_from_tree(self.repo, tree, message, parent_commits,
  File "/home/dries/tmp/.venv/lib/python3.8/site-packages/git/objects/commit.py", line 349, in create_from_tree
    author_time, author_offset = parse_date(author_date_str)
  File "/home/dries/tmp/.venv/lib/python3.8/site-packages/git/objects/util.py", line 205, in parse_date
    raise ValueError("Unsupported date format: %s" % string_date)
ValueError: Unsupported date format: @1574863356 +0100

Process finished with exit code 1

It seems that adding this:

if timestamp.startswith('@'):
    timestamp = timestamp[1:]

in the parse_date function in on objects/util.py#L150 will fix that.

Turns out the @ prefix is something from GNU corutil's date command.

I can make that a part of my PR too if you like.

Byron commented 4 years ago

Sorry for the late reply! @dries007 I would be very happy about providing something like 'isolated' as keyword argument to the functions that need it. That would allow them to be pure and thus less surprising in some situations.