aspiers / git-deps

git commit dependency analysis tool
GNU General Public License v2.0
298 stars 47 forks source link

KeyError in blame_hunk #50

Open bmwiedemann opened 9 years ago

bmwiedemann commented 9 years ago

I ran into some bug (on openSUSE-13.2 and Factory) using:

git clone git://github.com/openstack/python-keystoneclient.git
cd python-keystoneclient/
git-deps 3d6d749e6f0fef682a88758e1a2f6c9e8e7bd23c

this produced

7920899af119d1697c333d202ca3272f167c19b0
[...some more hashes...]
Traceback (most recent call last):
  File "/home/bernhard/bin/git-deps", line 790, in <module>
    main()
  File "/home/bernhard/bin/git-deps", line 784, in main
    cli(options, args)
  File "/home/bernhard/bin/git-deps", line 671, in cli
    detector.find_dependencies(dependent_rev)
  File "/home/bernhard/bin/git-deps", line 402, in find_dependencies
    self.find_dependencies_with_parent(dependent, parent)
  File "/home/bernhard/bin/git-deps", line 426, in find_dependencies_with_parent
    self.blame_hunk(dependent, parent, path, hunk)
  File "/home/bernhard/bin/git-deps", line 529, in blame_hunk
    rev = line_to_culprit[line_num]
KeyError: 2
dwm1945 commented 7 years ago

Is there any feedback available on the status of this bug? My exception isn't quite identical and since I used --recurse the conditions aren't quite the same, but what I get is:

Traceback (most recent call last):
  File "/home/dmorris/Tools/git-deps/git-deps.py", line 794, in <module>
    main()
  File "/home/dmorris/Tools/git-deps/git-deps.py", line 788, in main
    cli(options, args)
  File "/home/dmorris/Tools/git-deps/git-deps.py", line 675, in cli
    detector.find_dependencies(dependent_rev)
  File "/home/dmorris/Tools/git-deps/git-deps.py", line 406, in find_dependencies
    self.find_dependencies_with_parent(dependent, parent)
  File "/home/dmorris/Tools/git-deps/git-deps.py", line 430, in find_dependencies_with_parent
    self.blame_hunk(dependent, parent, path, hunk)
  File "/home/dmorris/Tools/git-deps/git-deps.py", line 533, in blame_hunk
    rev = line_to_culprit[line_num]

My command:

git-deps.py --recurse   9a0cd44c4e9930c40b8ad128a12999061599dcf4 e5e4e280693b8fd9db389aa3defd7a2b1fee013f

from which I get about 20 pairs of commit IDs before the traceback. W/o the --recurse flag, I get a list of 5 commit IDs and no exception.

dwm1945 commented 7 years ago

After adding a bit of debug logging, what I found is that hunk.lines has the gratuitous extra line added matching: "\n\ No newline at end of file" which I presume is because the file in git didn't end with "\n"

I added filtering for that string to the loop and that prevents the exception ... though now I'm getting what seems to be an impossibly long list of dependencies.

aspiers commented 7 years ago

Cool, thanks!!

Haven't had time to check yet but I would guess that if you are getting an impossibly long list then it's because you're using --recurse without also using --exclude-commits to set an upper bound on the set of commits. Commit dependency trees are often very thick, so if you recurse without bounds it's not uncommon to end up traversing the majority of the repository's history.

dwm1945 commented 7 years ago

Long list ... yes I hacked in a change to read the excludes from a file where the excludes were obtained from "git log" on the branch I'm backporting to. The list now reflects what I tend to expect. Only a couple hundred commits.

aspiers commented 7 years ago

Sounds like an interesting change - feel free to submit a PR ;-)

aspiers commented 7 years ago

Also, would be very grateful if you could provide the patch which fixes the original KeyError issue!

dwm1945 commented 7 years ago

will do both when I have completed this major backport ... I'll have more confidence then that what I've done is complete ....

The current patch for the keyerror:

        diff_format = '      |%8.8s %5s %s%s'
        hunk_header = '@@ %s %s @@' % (line_range_before, line_range_after)
        self.logger.debug(diff_format % ('--------', '-----', '', hunk_header))
        line_num = hunk.old_start
        for line in hunk.lines:
            if "\n\\ No newline at end of file" == line.content.rstrip() :
                break
            if line.origin == '+':

I inserted the two lines following "for line in hunk.lines. This is a hack and I didn't spend any time walking backwared to the creation of the hunk list where this would have been more appropriate.