ScottDuckworth / python-anyvcs

A Python abstraction layer for multiple version control systems
BSD 3-Clause "New" or "Revised" License
11 stars 4 forks source link

svn: only export single path in diff(path!=None) #55

Closed OEP closed 10 years ago

OEP commented 10 years ago

Subversion can export a particular path at a cost dependent on the size of the file. This speeds up diff() drastically when only looking at a single, smaller file.

$> time python testdiff.py /local/pkilgo/puppet-main/ 1 320 trunk/manifests/site.pp > /dev/null

real    0m3.124s
user    0m2.744s
sys 0m0.328s

--- vs --

$> time python testdiff.py /local/pkilgo/puppet-main/ 1 320 trunk/manifests/site.pp > /dev/null

real    0m0.095s
user    0m0.044s
sys 0m0.028s
OEP commented 10 years ago

I'm not so sure about Svn 1.5 compatibility. The documentation doesn't mention this usage:

OEP commented 10 years ago

I think this is finally done. I've checked it out in the original problem area and I can't find any issues. Here's a little summary of what all went wrong:

  1. I switched to using svn diff as the primary (fastest) means of getting a diff. Subversion fails if you try to svn diff a path that didn't exist at one of the provided revisions. I could not find a way to force Subversion to diff that case (but there may be a way), so I wrote a special handler for that case which just uses Python's difflib.
  2. There were a bunch of encoding problems that surfaced since Python 3's difflib won't work for bytes(). It looks like diff() and pdiff() ought to be returning strings, but they were returning bytes in Python 3. So, I modified unit tests to check the results of diff() and pdiff() to make sure the type is correct, and modified library code accordingly. This could affect Python 3 users who were programming around that undocumented return type.
  3. Of course, there were encoding issues with binary files. For this I'm just comparing the sha1 hashes. This has the downside that you need to read the entire contents of both files, but I could not notice a performance difference in reading the minimal number of bytes so I left it in.