gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.
http://gitpython.readthedocs.org
BSD 3-Clause "New" or "Revised" License
4.62k stars 906 forks source link

GitPython's overloads on standard library must not blow #600

Open yarikoptic opened 7 years ago

yarikoptic commented 7 years ago

Originally discovered/reported in #598 (fixing somewhat the situation but not really).

(Pdb) bt
  /usr/lib/python2.7/unittest/case.py(329)run()
-> testMethod()
  /usr/lib/python2.7/dist-packages/nose/loader.py(418)loadTestsFromName()
-> addr.filename, addr.module)
  /usr/lib/python2.7/dist-packages/nose/importer.py(47)importFromPath()
-> return self.importFromDir(dir_path, fqname)
  /usr/lib/python2.7/dist-packages/nose/importer.py(94)importFromDir()
-> mod = load_module(part_fqname, fh, filename, desc)
  /home/buildbot/datalad-pr-docker-dl-nd90/build/datalad/tests/test__main__.py(17)<module>()
-> from .. import __main__, __version__
  /home/buildbot/datalad-pr-docker-dl-nd90/build/datalad/__main__.py(13)<module>()
-> from .auto import AutomagicIO
  /home/buildbot/datalad-pr-docker-dl-nd90/build/datalad/auto.py(20)<module>()
-> import h5py
  /usr/lib/python2.7/dist-packages/h5py/__init__.py(60)<module>()
-> from .tests import run_tests
  /usr/lib/python2.7/dist-packages/h5py/tests/__init__.py(15)<module>()
-> from . import old, hl
  /usr/lib/python2.7/dist-packages/h5py/tests/old/__init__.py(4)<module>()
-> from . import ( test_attrs,
  /usr/lib/python2.7/dist-packages/h5py/tests/old/test_group.py(41)<module>()
-> fsencode(u"α")
  /usr/lib/python2.7/dist-packages/h5py/_hl/compat.py(68)fsencode()
-> return filename.encode(encoding, errors)
  /usr/lib/python2.7/dist-packages/git/compat.py(180)surrogateescape_handler()
-> decoded = replace_surrogate_encode(mystring)
> /usr/lib/python2.7/dist-packages/git/compat.py(207)replace_surrogate_encode()
-> raise exc

as you can see -- there is no import of git anywhere in the stack (was imported some time before) and the surrogateescape_handler kicks in, and then fails in a "non standard way". If it does provide some custom handling, IMHO whatever exception it could ever throw should be a subclass of possibly expected "upstairs" UnicodeEncodeError. Here is a bit more detail from the stack

(Pdb) l
202             code = ord(ch)
203     
204             # The following magic comes from Py3.3's Python/codecs.c file:
205             if not 0xD800 <= code <= 0xDCFF:
206                 # Not a surrogate. Fail with the original exception.
207  ->             raise exc
208             # mybytes = [0xe0 | (code >> 12),
209             #            0x80 | ((code >> 6) & 0x3f),
210             #            0x80 | (code & 0x3f)]
211             # Is this a good idea?
212             if 0xDC00 <= code <= 0xDC7F:
(Pdb) p code
945
(Pdb) up
> /usr/lib/python2.7/dist-packages/git/compat.py(180)surrogateescape_handler()
-> decoded = replace_surrogate_encode(mystring)
(Pdb) p mystring
u'\u03b1'
(Pdb) up
> /usr/lib/python2.7/dist-packages/h5py/_hl/compat.py(68)fsencode()
-> return filename.encode(encoding, errors)
(Pdb) p filename
u'\u03b1'
(Pdb) up
> /usr/lib/python2.7/dist-packages/h5py/tests/old/test_group.py(41)<module>()
-> fsencode(u"α")
(Pdb) l
 38     # If we can't encode unicode filenames, there's not much point failing tests
 39     # which must fail
 40     try:
 41         fsencode(u"α")
 42     except UnicodeEncodeError:
 43  ->     NO_FS_UNICODE = True
 44     else:
 45         NO_FS_UNICODE = False
 46     
 47     
 48     class BaseGroup(TestCase):
Byron commented 7 years ago

Thanks for posting! What should be done better than what was done in the linked PR #598 ?

yarikoptic commented 7 years ago

I think so

Byron commented 7 years ago

Thank you, now I understand. I was assuming that NotASurrogateError was already a derivative of UnicodeError. In any case, creating such an exception should be a piece of cake, right?

yarikoptic commented 7 years ago

I guess so... and would appreciate a quick fix up -- ran into the same problem in a different scenario now :-(

Byron commented 7 years ago

I think this should be implemented by someone who can reproduce the issue. Creating a UnicodeDecodeError with its 5 arguments shouldn't be impossible.