gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.
http://gitpython.readthedocs.org
BSD 3-Clause "New" or "Revised" License
4.6k stars 905 forks source link

Git diff pathspec missing -- #1931

Closed kitserve closed 3 months ago

kitserve commented 3 months ago

I've been hitting an issue where:

repo = git.Repo('path/to/repo')
for commit in repo.iter_commits():
    if len(commit.parents) == 0:
        previous_commit = None
    else:
        previous_commit = commit.parents[0].tree
    for file in commit.stats.files:
        status = repo.git.diff('--name-status', previous_commit, file)

keeps blowing up with an exception along the lines of:

git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git diff --name-status fea6f08304c7e01f3dae87947901eee1fcba55eb example.txt
  stderr: 'fatal: ambiguous argument 'example.txt': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]''

I've had to work around it with a raw command that includes the -- argument like:

status = repo.git.diff('--name-status', previous_commit.hexsha, '--', file)

This seems related to #1061. Looking at https://github.com/gitpython-developers/GitPython/blob/ee987dac1c7456c9bda7bd62e8ac2952da38e31e/git/diff.py#L268 it seems there the code is attempting to add the -- argument before the pathspec, but it's not working. I'm not sure if this is me doing something wrong or a genuine bug, but either way I think I either the documentation or the code needs updating to address this issue.

Byron commented 3 months ago

There is a mixup between repo.git.diff and repo.diff - one is calling the Git program directly, the other one abstracts it.

kitserve commented 3 months ago

Well, perhaps this isn't a bug in the code. If that's the case, can the documentation be clarified? If I've understood your comment, I should be trying something like:

status = repo.diff(commit, file, '--name-status')

but that throws

AttributeError: 'Repo' object has no attribute 'diff'

What am I missing?

Byron commented 3 months ago

It's true, repo.diff isn't actually there, apologies. However, here is a paragraph in the docs that serves as introduction.

kitserve commented 3 months ago

Thanks. In case it helps someone else who gets confused by the same thing, I ended up switching to something along the lines of:

EMPTY_TREE_SHA = '4b825dc642cb6eb9a060e54bf8d69288fbee4904'
repo = git.Repo('path/to/repo')
for commit in repo.iter_commits():
    if commit.parents:
        previous_commit = commit.parents[0]
    else:
        previous_commit = EMPTY_TREE_SHA

    diffs = {
        diff.a_path: diff for diff in commit.diff(previous_commit, R=True)
    }

    for file, stats in commit.stats.files.items():
        diff = diffs.get(file)
        status = diff.change_type