aspiers / git-deps

git commit dependency analysis tool
GNU General Public License v2.0
300 stars 47 forks source link

UTF-8 decode error #103

Open futureweihk opened 4 years ago

futureweihk commented 4 years ago

Dear Sir,

we run a normal command like git deps -d 90xxxxx0deeca0a98fbac4368c547bd650c99a95, meet below error:

Traceback (most recent call last): File "/usr/local/bin/git-deps", line 11, in sys.exit(run()) File "/usr/local/lib/python3.6/site-packages/git_deps/cli.py", line 141, in run main(sys.argv[1:]) File "/usr/local/lib/python3.6/site-packages/git_deps/cli.py", line 135, in main cli(options, args) File "/usr/local/lib/python3.6/site-packages/git_deps/cli.py", line 119, in cli detector.find_dependencies(rev) File "/usr/local/lib/python3.6/site-packages/git_deps/detector.py", line 122, in find_dependencies self.find_dependencies_with_parent(dependent, parent) File "/usr/local/lib/python3.6/site-packages/git_deps/detector.py", line 147, in find_dependencies_with_parent self.blame_hunk(dependent, parent, path, hunk) File "/usr/local/lib/python3.6/site-packages/git_deps/detector.py", line 172, in blame_hunk blame = subprocess.check_output(cmd, universal_newlines=True) File "/usr/local/lib/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/usr/local/lib/python3.6/subprocess.py", line 425, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/usr/local/lib/python3.6/subprocess.py", line 850, in communicate stdout = self.stdout.read() File "/usr/local/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 624: invalid start byte

As we enable the debug mode, we find the log last part of "Blaming hunk" as below: Sep 21 11:30:16 DEBUG Blaming hunk -454,8 @ c7cfcd69 Sep 21 11:30:16 DEBUG !863359ffe0c44c26890b903dfa637c139f6e60ef 454 454 4 90877300deeca0a98fbac4368c547bd650c99a95 863359ffe0c44c26890b903dfa637c139f6e60ef Sep 21 11:30:16 DEBUG New dependency 90877300 -> 863359ff via line 454 (prod standalone.xml add VN,KH DB info) Sep 21 11:30:16 DEBUG New line for 90877300 -> 863359ff: 863359ffe0c44c26890b903dfa637c139f6e60ef 454 454 4 Sep 21 11:30:16 DEBUG !author XXX16374 Sep 21 11:30:16 DEBUG !author-mail XXX16374@xxxxxxx Sep 21 11:30:16 DEBUG !author-time 1592396697 Sep 21 11:30:16 DEBUG !author-tz +0800 Sep 21 11:30:16 DEBUG !committer XXX16374 Sep 21 11:30:16 DEBUG !committer-mail XXX16374@xxxxxxx Sep 21 11:30:16 DEBUG !committer-time 1592396697 Sep 21 11:30:16 DEBUG !committer-tz +0800 Sep 21 11:30:16 DEBUG !summary prod standalone.xml add VN,KH DB info Sep 21 11:30:16 DEBUG !previous 315fe9922d84db614e42a0c18ae48a369adedf83 RESOURCES/TW_SVC/prod/jboss-eap-7.0/standalone.xml Sep 21 11:30:16 DEBUG !filename RESOURCES/TW_SVC/prod/jboss-eap-7.0/standalone.xml Sep 21 11:30:16 DEBUG ! oracle.net.encryption_client=REQUIRED, Sep 21 11:30:16 DEBUG !863359ffe0c44c26890b903dfa637c139f6e60ef 455 455 Sep 21 11:30:16 DEBUG New line for 90877300 -> 863359ff: 863359ffe0c44c26890b903dfa637c139f6e60ef 455 455 Sep 21 11:30:16 DEBUG ! oracle.net.encryption_types_client=(AES256,AES192), Sep 21 11:30:16 DEBUG !863359ffe0c44c26890b903dfa637c139f6e60ef 456 456 Sep 21 11:30:16 DEBUG New line for 90877300 -> 863359ff: 863359ffe0c44c26890b903dfa637c139f6e60ef 456 456 Sep 21 11:30:16 DEBUG ! oracle.net.crypto_checksum_client=REQUIRED, Sep 21 11:30:16 DEBUG !863359ffe0c44c26890b903dfa637c139f6e60ef 457 457 Sep 21 11:30:16 DEBUG New line for 90877300 -> 863359ff: 863359ffe0c44c26890b903dfa637c139f6e60ef 457 457 Sep 21 11:30:16 DEBUG ! oracle.net.crypto_checksum_types_client=SHA1 Sep 21 11:30:16 DEBUG !06803a1f010cb0a8b75919e1a6870f7f8e835250 521 458 1 Sep 21 11:30:16 DEBUG New line for 90877300 -> 06803a1f: 06803a1f010cb0a8b75919e1a6870f7f8e835250 521 458 1 Sep 21 11:30:16 DEBUG !author XXX16374 Sep 21 11:30:16 DEBUG !author-mail XXX16374@xxxxxxx Sep 21 11:30:16 DEBUG !author-time 1584068622 Sep 21 11:30:16 DEBUG !author-tz +0800 Sep 21 11:30:16 DEBUG !committer XXX16374 Sep 21 11:30:16 DEBUG !committer-mail XXX16374@xxxxxxxxx Sep 21 11:30:16 DEBUG !committer-time 1584068622 Sep 21 11:30:16 DEBUG !committer-tz +0800 Sep 21 11:30:16 DEBUG !summary update RESOURCES Sep 21 11:30:16 DEBUG !filename RESOURCES/TW_SVC/prod/jboss-eap-7.0/standalone.xml Sep 21 11:30:16 DEBUG ! Sep 21 11:30:16 DEBUG !863359ffe0c44c26890b903dfa637c139f6e60ef 459 459 2 Sep 21 11:30:16 DEBUG New line for 90877300 -> 863359ff: 863359ffe0c44c26890b903dfa637c139f6e60ef 459 459 2 Sep 21 11:30:16 DEBUG ! oracle7_XA Sep 21 11:30:16 DEBUG !863359ffe0c44c26890b903dfa637c139f6e60ef 460 460 Sep 21 11:30:16 DEBUG New line for 90877300 -> 863359ff: 863359ffe0c44c26890b903dfa637c139f6e60ef 460 460 Sep 21 11:30:16 DEBUG ! ALTER SESSION SET current_schema=XXXXX Sep 21 11:30:16 DEBUG !06803a1f010cb0a8b75919e1a6870f7f8e835250 523 461 1 Sep 21 11:30:16 DEBUG New line for 90877300 -> 06803a1f: 06803a1f010cb0a8b75919e1a6870f7f8e835250 523 461 1 Sep 21 11:30:16 DEBUG ! TRANSACTION_READ_COMMITTED Sep 21 11:30:16 DEBUG ! Sep 21 11:30:16 DEBUG |-------- ----- @@ -454,8 +454,8 @@ Sep 21 11:30:16 DEBUG |863359ff 454 oracle.net.encryption_client=REQUIRED, Sep 21 11:30:16 DEBUG |863359ff 455 - oracle.net.encryption_types_client=(AES256,AES192), Sep 21 11:30:16 DEBUG |863359ff 456 - oracle.net.crypto_checksum_client=REQUIRED, Sep 21 11:30:16 DEBUG |863359ff 457 - oracle.net.crypto_checksum_types_client=SHA1 Sep 21 11:30:16 DEBUG | + oracle.net.encryption_types_client=(AES256,AES192), Sep 21 11:30:16 DEBUG | + oracle.net.crypto_checksum_client=REQUIRED, Sep 21 11:30:16 DEBUG | + oracle.net.crypto_checksum_types_client=SHA1 Sep 21 11:30:16 DEBUG |06803a1f 458 Sep 21 11:30:16 DEBUG |863359ff 459 - oracle7_XA Sep 21 11:30:16 DEBUG |863359ff 460 - ALTER SESSION SET current_schema=XXXXX Sep 21 11:30:16 DEBUG | + oracle8_XA Sep 21 11:30:16 DEBUG | + ALTER SESSION SET current_schema=XXXXX Sep 21 11:30:16 DEBUG |06803a1f 461 TRANSACTION_READ_COMMITTED Sep 21 11:30:16 DEBUG Blaming hunk -470,3 @ c7cfcd69

Does the log mean that the error is happen on line 470? many thanks for your help.

aspiers commented 4 years ago

This is most likely a bug in the approach to UTF-8 decoding. Perhaps the data being decoded is not actually UTF-8. Either way it will be related to the use of Python 3, as I haven't gotten around to doing heavy testing on Python 3 yet. See also #98 and #87 which are both related to Python 3.

aspiers commented 4 years ago

Are you able to share the repository which caused this bug, so we can try to reproduce?

futureweihk commented 4 years ago

This is most likely a bug in the approach to UTF-8 decoding. Perhaps the data being decoded is not actually UTF-8. Either way it will be related to the use of Python 3, as I haven't gotten around to doing heavy testing on Python 3 yet. See also #98 and #87 which are both related to Python 3.

Thanks, so do you mean that if we use Pyhon 2.7 can avoid such error?

futureweihk commented 4 years ago

How can we change git-deps Python engine? need reinstall git-deps, or any parameter changing can acheive this? thx

aspiers commented 4 years ago

@futureweihk commented on September 22, 2020 4:54 AM:

Thanks, so do you mean that if we use Pyhon 2.7 can avoid such error?

Possibly - I'd say there is a good chance but I can't guarantee it.

@futureweihk commented on September 22, 2020 4:56 AM:

How can we change git-deps Python engine? need reinstall git-deps, or any parameter changing can acheive this? thx

That depends very much on your OS and how you normally install Python. Please just follow standard Python installation documentation, as I do not have time to provide general support for Python. git-deps does not do anything significantly different to other Python programs, so standard procedures work as normal. If you are not too familiar with Python then you could alternatively find a Python consultant to help. It is not hard.

futureweihk commented 4 years ago

Sir,

We change the detector.py line 173 from: blame = subprocess.check_output(cmd, universal_newlines=True) to: blame = subprocess.check_output(cmd, encoding="utf-8", errors="replace", universal_newlines=True) Now seems the git-deps can run without utf-8 error, do you have any suggestions for the approach?

aspiers commented 3 years ago

Thanks, that's very helpful. I still need this though:

@aspiers commented on September 21, 2020 5:14 PM:

Are you able to share the repository which caused this bug, so we can try to reproduce?

so that I can reproduce and test the fix. Please can you share it?