Storyyeller / Krakatau

Java decompiler, assembler, and disassembler
GNU General Public License v3.0
1.95k stars 220 forks source link

UnicodeEncodeError #78

Closed samczsun closed 7 years ago

samczsun commented 8 years ago
Traceback (most recent call last):
  File "decompile.py", line 155, in <module>
    decompileClass(path, targets, args.out, args.skip, magic_throw=args.xmagicth
row)
  File "decompile.py", line 113, in decompileClass
    package = 'package {};\n\n'.format(target.replace('/','.').rpartition('.')[0
])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 9:
ordinal not in range(128)

Obfuscated jar with some funky paths. If you need the JAR I can send it to you

Storyyeller commented 8 years ago

Ug, unicode errors. No matter how many times I try to fix them, they seem to pop up again. I wish I could use Python 3 style strings.

samczsun commented 8 years ago

Why not use Python 3? I'm not familiar with the history of Krakatau so apologies if you've mentioned it somewhere

Storyyeller commented 8 years ago

Mostly because it's a ton of work to upgrade and Pypy has poor support.

MrMetric commented 8 years ago

Don't forget about from __future__ import unicode_literals

FFY00 commented 8 years ago

I have the same problem. I'm getting this error in every file I try to disassemble.

processing target dec/Main.class, 1/1 remaining Traceback (most recent call last): File "disassemble.py", line 63, in disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 31, in disassembleClass filename = out.write(name, output.getvalue()) File "PATH\Krakatau\Krakatau\script _util.py", line 109, in write out = self.makepath(cname) File "PATH\Krakatau\Krakatau\script _util.py", line 88, in winSanitizePath return '\?{}{}{}'.format(base, path, suffix) UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-31: ord inal not in range(128)

Storyyeller commented 8 years ago

You are on Windows, right? I'm not sure how that is even possible, because on Windows, it sanitizes paths to never use non-ascii characters. I even just double checked the code in question.

Janmm14 commented 7 years ago

Hi!

I just want to bump this issue (the original one), as its fairly annoying. It also happens at disassembling classes btw.

Storyyeller commented 7 years ago

could you provide an example jar? I suppose I can at least try to fix the places where people are seeing errors.

Janmm14 commented 7 years ago

Sure, will be doing it today (btw at disassembling the error message in the last line is the same except the character and the position changes obviously).

Janmm14 commented 7 years ago

Here is the file, its just a renamed .jar file.

Edit: Btw. I just used Krakatau in combination with https://github.com/helios-decompiler/helios

Storyyeller commented 7 years ago

Should be fixed now. Tell me if you see any more issues.

samczsun commented 7 years ago

I've upgraded Helios with the latest revision of Krakatau but it appears that the UnicodeDecodeError still occurs:

Traceback (most recent call last):
  File "disassemble.py", line 63, in <module>
    disassembleClass(readTarget, targets, args.out, args.roundtrip)
  File "disassemble.py", line 21, in disassembleClass
    print 'processing target {}, {}/{} remaining'.format(target.encode('utf8'), len(targets)-i, len(targets))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 6: ordinal not in range(128)

If I remove the problematic encode call, then this occurs:

C:\Python27\lib\zipfile.py:906: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  info = self.NameToInfo.get(name)
Traceback (most recent call last):
  File "disassemble.py", line 63, in <module>
    disassembleClass(readTarget, targets, args.out, args.roundtrip)
  File "disassemble.py", line 23, in disassembleClass
    data = readTarget(target)
  File "disassemble.py", line 57, in readArchive
    with archive.open(name) as f:
  File "C:\Python27\lib\zipfile.py", line 961, in open
    zinfo = self.getinfo(name)
  File "C:\Python27\lib\zipfile.py", line 909, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'Nyrox/\\xa3\\xc5/\\xa3\\xc2.class' in the archive"

Should there be a method of targeting a specific file with Unicode characters? Perhaps whatever hash Krakatau happens to use to write the files can be used? It may be inefficient but it does get the job done

Storyyeller commented 7 years ago

What did you do to cause the error?

samczsun commented 7 years ago

Relevant code can be found here:

https://github.com/helios-decompiler/Helios/blob/master/src/main/java/com/heliosdecompiler/helios/transformers/disassemblers/KrakatauDisassembler.java#L59

I suspect passing a Unicode filename to disassemble.py is not handled properly as I can disassemble the file fine from the command line

Storyyeller commented 7 years ago

So is the problem that Java is passing unicode as the command line argument?

It's odd because I actually did test passing a unicode filepath in the command prompt, but maybe the encoding is different when you do it that way.

samczsun commented 7 years ago

I'll do some research about Java and Unicode on the command line and get back to you

samczsun commented 7 years ago

I tried running this command in Git Bash and cmd:

py disassemble.py -path Nyrox17Priv.jar Nyrox/nyrox_nyroxinvis_Ó/nyrox_nyroxinvis_Ç.class

The results were

Traceback (most recent call last):
  File "disassemble.py", line 63, in <module>
    disassembleClass(readTarget, targets, args.out, args.roundtrip)
  File "disassemble.py", line 21, in disassembleClass
    print 'processing target {}, {}/{} remaining'.format(target.encode('utf8'), len(targets)-i, len(targets))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 23: ordinal not in range(128)

And once I removed the offending encode call,

On Git Bash:

C:\Python27\lib\zipfile.py:906: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  info = self.NameToInfo.get(name)
Traceback (most recent call last):
  File "disassemble.py", line 63, in <module>
    disassembleClass(readTarget, targets, args.out, args.roundtrip)
  File "disassemble.py", line 23, in disassembleClass
    data = readTarget(target)
  File "disassemble.py", line 57, in readArchive
    with archive.open(name) as f:
  File "C:\Python27\lib\zipfile.py", line 961, in open
    zinfo = self.getinfo(name)
  File "C:\Python27\lib\zipfile.py", line 909, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'Nyrox/nyrox_nyroxinvis_\\xd3/nyrox_nyroxinvis_\\xc7.class' in the archive"

On cmd

Traceback (most recent call last):
  File "disassemble.py", line 63, in <module>
    disassembleClass(readTarget, targets, args.out, args.roundtrip)
  File "disassemble.py", line 23, in disassembleClass
    data = readTarget(target)
  File "disassemble.py", line 57, in readArchive
    with archive.open(name) as f:
  File "C:\Python27\lib\zipfile.py", line 961, in open
    zinfo = self.getinfo(name)
  File "C:\Python27\lib\zipfile.py", line 909, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'Nyrox/nyrox_nyroxinvis_+/nyrox_nyroxinvis_\\xa6.class' in the archive"
Storyyeller commented 7 years ago

Oh I see, you're using -path with a jar and a unicode class name. I didn't test that case. I'll look into it later.

On Sat, Jul 16, 2016 at 6:47 PM, Sam Sun notifications@github.com wrote:

I tried running this command in Git Bash and cmd:

py disassemble.py -path Nyrox17Priv.jar Nyrox/nyrox_nyroxinvis_Ó/nyrox_nyroxinvis_Ç.class

The results were

Traceback (most recent call last): File "disassemble.py", line 63, in disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 21, in disassembleClass print 'processing target {}, {}/{} remaining'.format(target.encode('utf8'), len(targets)-i, len(targets)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd3 in position 23: ordinal not in range(128)

And once I removed the offending encode call,

On Git Bash:

C:\Python27\lib\zipfile.py:906: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal info = self.NameToInfo.get(name) Traceback (most recent call last): File "disassemble.py", line 63, in disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 23, in disassembleClass data = readTarget(target) File "disassemble.py", line 57, in readArchive with archive.open(name) as f: File "C:\Python27\lib\zipfile.py", line 961, in open zinfo = self.getinfo(name) File "C:\Python27\lib\zipfile.py", line 909, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'Nyrox/nyroxnyroxinvis\xd3/nyroxnyroxinvis\xc7.class' in the archive"

On cmd

Traceback (most recent call last): File "disassemble.py", line 63, in disassembleClass(readTarget, targets, args.out, args.roundtrip) File "disassemble.py", line 23, in disassembleClass data = readTarget(target) File "disassemble.py", line 57, in readArchive with archive.open(name) as f: File "C:\Python27\lib\zipfile.py", line 961, in open zinfo = self.getinfo(name) File "C:\Python27\lib\zipfile.py", line 909, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'Nyrox/nyroxnyroxinvis+/nyroxnyroxinvis\xa6.class' in the archive"

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Storyyeller/Krakatau/issues/78#issuecomment-233160595, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9A-gbIDRzZ04v3GtMUVfeKruEYjHKlks5qWYmngaJpZM4H3lW8 .

Storyyeller commented 7 years ago

Hopefully this should fix the issues once and for all. Though I think it will be hard to completely eliminate the issues as long as I'm stuck on Python 2.

samczsun commented 7 years ago

Unfortunately it still does not appear to work. Even disregarding the apparently missed encode call here: https://github.com/Storyyeller/Krakatau/blob/master/disassemble.py#L57,

the error of "There is no item named in the archive" is still occuring

edit: I tried messing around with it a bit and wow python is really finnicky about this. I don't blame you if you don't want to mess with this right now

Janmm14 commented 7 years ago

Is it possible that helios extracts the file to a temp file with another file name (like 1.class), @samczsun? (And adding the original jar to the path)

samczsun commented 7 years ago

Just an update, I'm going to be deploying an update to Helios with some of my attempted bugfixes. I'll submit a PR in a little bit or if you want you can merge it directly.

This is just based on my fixing what appears to be broken, so there may be many cases I haven't accounted for, and I might have caused regressions. However, there doesn't appear to be many tests pertaining to Unicode filenames so I think I should be good

Storyyeller commented 7 years ago

On a side note: I've often found in the past that fixing the errors for one platform just breaks it on another platform, or when running with a different set of options. These things are annoying inconsistent.

samczsun commented 7 years ago

Exactly. These fixes I've tested on Windows 10. If you don't use Windows perhaps you could see if I've caused regressions (?).

I can test on some Linux distributions on the weekend

Storyyeller commented 7 years ago

I used to develop on Windows, but last year I switched to Linux. I still have my Windows laptop, but it's a huge pain to test on both.

samczsun commented 7 years ago

I don't know how well GitHub handles Unicode filenames, but perhaps we could create some tests for this particular bug and run it in our respective primary environments?

Storyyeller commented 7 years ago

Encoding of filenames is fairly inconsistent. But I suppose I could use a jar test, now that I added support for jars.

Of course, the most interesting part to test is the handling of filenames as passed in from the command line and output, and that's not covered by tests at all.

samczsun commented 7 years ago

Yeah, that's especially true on Windows. There were different behaviours when decompiling/disassembling from an archive versus from a directory (some names came out as unicode, others as str). Hopefully Linux isn't as odd