Closed samczsun closed 7 years ago
Ug, unicode errors. No matter how many times I try to fix them, they seem to pop up again. I wish I could use Python 3 style strings.
Why not use Python 3? I'm not familiar with the history of Krakatau so apologies if you've mentioned it somewhere
Mostly because it's a ton of work to upgrade and Pypy has poor support.
Don't forget about from __future__ import unicode_literals
I have the same problem. I'm getting this error in every file I try to disassemble.
processing target dec/Main.class, 1/1 remaining Traceback (most recent call last): File "disassemble.py", line 63, in
disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 31, in disassembleClass filename = out.write(name, output.getvalue()) File "PATH\Krakatau\Krakatau\script _util.py", line 109, in write out = self.makepath(cname) File "PATH\Krakatau\Krakatau\script _util.py", line 88, in winSanitizePath return '\?{}{}{}'.format(base, path, suffix) UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-31: ord inal not in range(128)
You are on Windows, right? I'm not sure how that is even possible, because on Windows, it sanitizes paths to never use non-ascii characters. I even just double checked the code in question.
Hi!
I just want to bump this issue (the original one), as its fairly annoying. It also happens at disassembling classes btw.
could you provide an example jar? I suppose I can at least try to fix the places where people are seeing errors.
Sure, will be doing it today (btw at disassembling the error message in the last line is the same except the character and the position changes obviously).
Here is the file, its just a renamed .jar file.
Edit: Btw. I just used Krakatau in combination with https://github.com/helios-decompiler/helios
Should be fixed now. Tell me if you see any more issues.
I've upgraded Helios with the latest revision of Krakatau but it appears that the UnicodeDecodeError still occurs:
Traceback (most recent call last):
File "disassemble.py", line 63, in <module>
disassembleClass(readTarget, targets, args.out, args.roundtrip)
File "disassemble.py", line 21, in disassembleClass
print 'processing target {}, {}/{} remaining'.format(target.encode('utf8'), len(targets)-i, len(targets))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 6: ordinal not in range(128)
If I remove the problematic encode call, then this occurs:
C:\Python27\lib\zipfile.py:906: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
info = self.NameToInfo.get(name)
Traceback (most recent call last):
File "disassemble.py", line 63, in <module>
disassembleClass(readTarget, targets, args.out, args.roundtrip)
File "disassemble.py", line 23, in disassembleClass
data = readTarget(target)
File "disassemble.py", line 57, in readArchive
with archive.open(name) as f:
File "C:\Python27\lib\zipfile.py", line 961, in open
zinfo = self.getinfo(name)
File "C:\Python27\lib\zipfile.py", line 909, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'Nyrox/\\xa3\\xc5/\\xa3\\xc2.class' in the archive"
Should there be a method of targeting a specific file with Unicode characters? Perhaps whatever hash Krakatau happens to use to write the files can be used? It may be inefficient but it does get the job done
What did you do to cause the error?
Relevant code can be found here:
I suspect passing a Unicode filename to disassemble.py is not handled properly as I can disassemble the file fine from the command line
So is the problem that Java is passing unicode as the command line argument?
It's odd because I actually did test passing a unicode filepath in the command prompt, but maybe the encoding is different when you do it that way.
I'll do some research about Java and Unicode on the command line and get back to you
I tried running this command in Git Bash and cmd:
py disassemble.py -path Nyrox17Priv.jar Nyrox/nyrox_nyroxinvis_Ó/nyrox_nyroxinvis_Ç.class
The results were
Traceback (most recent call last):
File "disassemble.py", line 63, in <module>
disassembleClass(readTarget, targets, args.out, args.roundtrip)
File "disassemble.py", line 21, in disassembleClass
print 'processing target {}, {}/{} remaining'.format(target.encode('utf8'), len(targets)-i, len(targets))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 23: ordinal not in range(128)
And once I removed the offending encode
call,
On Git Bash:
C:\Python27\lib\zipfile.py:906: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
info = self.NameToInfo.get(name)
Traceback (most recent call last):
File "disassemble.py", line 63, in <module>
disassembleClass(readTarget, targets, args.out, args.roundtrip)
File "disassemble.py", line 23, in disassembleClass
data = readTarget(target)
File "disassemble.py", line 57, in readArchive
with archive.open(name) as f:
File "C:\Python27\lib\zipfile.py", line 961, in open
zinfo = self.getinfo(name)
File "C:\Python27\lib\zipfile.py", line 909, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'Nyrox/nyrox_nyroxinvis_\\xd3/nyrox_nyroxinvis_\\xc7.class' in the archive"
On cmd
Traceback (most recent call last):
File "disassemble.py", line 63, in <module>
disassembleClass(readTarget, targets, args.out, args.roundtrip)
File "disassemble.py", line 23, in disassembleClass
data = readTarget(target)
File "disassemble.py", line 57, in readArchive
with archive.open(name) as f:
File "C:\Python27\lib\zipfile.py", line 961, in open
zinfo = self.getinfo(name)
File "C:\Python27\lib\zipfile.py", line 909, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'Nyrox/nyrox_nyroxinvis_+/nyrox_nyroxinvis_\\xa6.class' in the archive"
Oh I see, you're using -path with a jar and a unicode class name. I didn't test that case. I'll look into it later.
On Sat, Jul 16, 2016 at 6:47 PM, Sam Sun notifications@github.com wrote:
I tried running this command in Git Bash and cmd:
py disassemble.py -path Nyrox17Priv.jar Nyrox/nyrox_nyroxinvis_Ó/nyrox_nyroxinvis_Ç.class
The results were
Traceback (most recent call last): File "disassemble.py", line 63, in
disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 21, in disassembleClass print 'processing target {}, {}/{} remaining'.format(target.encode('utf8'), len(targets)-i, len(targets)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 23: ordinal not in range(128) And once I removed the offending encode call,
On Git Bash:
C:\Python27\lib\zipfile.py:906: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal info = self.NameToInfo.get(name) Traceback (most recent call last): File "disassemble.py", line 63, in
disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 23, in disassembleClass data = readTarget(target) File "disassemble.py", line 57, in readArchive with archive.open(name) as f: File "C:\Python27\lib\zipfile.py", line 961, in open zinfo = self.getinfo(name) File "C:\Python27\lib\zipfile.py", line 909, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'Nyrox/nyroxnyroxinvis\xd3/nyroxnyroxinvis\xc7.class' in the archive" On cmd
Traceback (most recent call last): File "disassemble.py", line 63, in
disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 23, in disassembleClass data = readTarget(target) File "disassemble.py", line 57, in readArchive with archive.open(name) as f: File "C:\Python27\lib\zipfile.py", line 961, in open zinfo = self.getinfo(name) File "C:\Python27\lib\zipfile.py", line 909, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'Nyrox/nyroxnyroxinvis+/nyroxnyroxinvis\xa6.class' in the archive" — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Storyyeller/Krakatau/issues/78#issuecomment-233160595, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9A-gbIDRzZ04v3GtMUVfeKruEYjHKlks5qWYmngaJpZM4H3lW8 .
Hopefully this should fix the issues once and for all. Though I think it will be hard to completely eliminate the issues as long as I'm stuck on Python 2.
Unfortunately it still does not appear to work. Even disregarding the apparently missed encode call here: https://github.com/Storyyeller/Krakatau/blob/master/disassemble.py#L57,
the error of "There is no item named
edit: I tried messing around with it a bit and wow python is really finnicky about this. I don't blame you if you don't want to mess with this right now
Is it possible that helios extracts the file to a temp file with another file name (like 1.class
), @samczsun? (And adding the original jar to the path)
Just an update, I'm going to be deploying an update to Helios with some of my attempted bugfixes. I'll submit a PR in a little bit or if you want you can merge it directly.
This is just based on my fixing what appears to be broken, so there may be many cases I haven't accounted for, and I might have caused regressions. However, there doesn't appear to be many tests pertaining to Unicode filenames so I think I should be good
On a side note: I've often found in the past that fixing the errors for one platform just breaks it on another platform, or when running with a different set of options. These things are annoying inconsistent.
Exactly. These fixes I've tested on Windows 10. If you don't use Windows perhaps you could see if I've caused regressions (?).
I can test on some Linux distributions on the weekend
I used to develop on Windows, but last year I switched to Linux. I still have my Windows laptop, but it's a huge pain to test on both.
I don't know how well GitHub handles Unicode filenames, but perhaps we could create some tests for this particular bug and run it in our respective primary environments?
Encoding of filenames is fairly inconsistent. But I suppose I could use a jar test, now that I added support for jars.
Of course, the most interesting part to test is the handling of filenames as passed in from the command line and output, and that's not covered by tests at all.
Yeah, that's especially true on Windows. There were different behaviours when decompiling/disassembling from an archive versus from a directory (some names came out as unicode, others as str). Hopefully Linux isn't as odd
Obfuscated jar with some funky paths. If you need the JAR I can send it to you