Storyyeller / Krakatau

Java decompiler, assembler, and disassembler
GNU General Public License v3.0
1.95k stars 219 forks source link

Interest in Krakatau 2? #185

Open Storyyeller opened 2 years ago

Storyyeller commented 2 years ago

@KOLANICH @janmm14 @QwertyYtPl @samczsun @lab313ru @Dmunch04

I've been thinking about doing a complete ground-up redesign and modernization of Krakatau, but I'm not sure if there is enough interest to justify the effort, so I was curious if anyone would be interested in such a project. One particular problem is that I haven't been active in Java reverse engineering myself since 2015 or so, so I would be reliant on users to do all the testing. What do you think?

fee1-dead commented 1 year ago

Rust itself is the language incapable of proper dynamic linking and code reuse, so it fetches and rebuilds all the deps

This is not true at all. Python and Java don't do this because they by default cannot output statically linked executables.

When you compare Rust executables with other languages that have an entire runtime behind them it will never generate any meaningful result. If you want to compare Rust executables with Python applications, you could include the size of the python interpreter and its support libraries.

Janmm14 commented 1 year ago

@KOLANICH He wants to recode the decompiler in rust as well, so I don't think he will do sth with "legacy" python code. Also I am quite sure that the speed of the python decompiler being fed with disassembled code vs being fed with pure java files would not differ that much at all. Disassembly should be a rather lightweight task compared to decompilation.

KOLANICH commented 1 year ago

This is not true at all.

  1. I don't see support of discovering of shared libs written in Rust installed within system in cargo, like we do with C and C++ libs using pkg-config and CMake
  2. cargo, which is undoubtfully an integral part of the language, locks the versions of the dependencies
  3. people, who are mostly conformists, follow the ways imposed by cargo conception, and tend to rely on it. It causes a zoo of libs which API changes betwwen versions in an incompatible way. This makes the situation worse. Even if cargo devs implemented the workflow with shared libs within cargo, it still will be unusable and people will have to use the way it is currently used. So we cannot expect cargo devsnimplemented the needed functiinality.
  4. one of the arguments I have heard, is that it is impossible to enforce Rust guarantees when using dynamic linking.

It seems that Rust is and will be built around static linking. And if one needs dynamic-linking-capable-Rust, it will be not Rust, but a completely different language with an own ecosystem of libs and tools.

Amejonah1200 commented 1 year ago

And if one needs dynamic-linking-capable-Rust

They would use WASM for plugins like feather-rs does.

 locks the versions of the dependencies

Locking dependencies is a supply chain security mesure, it mitigates cases where someone overrides an existing version with different binary/code. By locking you save also the hashes, so the package manager will block these sorts of attacks!

KOLANICH commented 1 year ago

He wants to recode the decompiler in rust as well, so I don't think he will do sth with "legacy" python code.

I know. But to be able to test it properly, it is better to have the project split into small parts that can be debugged separately.

Also I am quite sure that the speed of the python decompiler being fed with disassembled code vs being fed with pure java files would not differ that much at all. Disassembly should be a rather lightweight task compared to decompilation.

It is not about speed. It is about being able to verify that Python and Rust impls do the same things and to be able use an another impl if one of them malfunctions.

Also Python impl is good for prototypig and tinkering. Compilation of Rust impl takes quite some time even after a change of 1 line in one file.

fee1-dead commented 1 year ago
  1. I don't see support of discovering of shared libs written in Rust installed within system in cargo, like we do with C and C++ libs using pkg-config and CMake

There is. But for C libraries. openssl-sys links to openssl dynamically.

fee1-dead commented 1 year ago

Also Python impl is good for prototypig and tinkering. Compilation of Rust impl takes quite some time even after a change of 1 line in one file.

Is python still going to be maintained? Also you should try incremental compilation as that would improve compile times after small changes.

KOLANICH commented 1 year ago

They would use WASM for plugins like feather-rs does.

sometimes one facepalm is not enough

KOLANICH commented 1 year ago

There is. But for C libraries. openssl-sys links to openssl dynamically.

Thanks for the info. But as I have said, in order to debloat projects written in Rust, it should be de-facto working in practice for Rust libs. If it is working in the tools, but the community is within a tragedy of commons situation and cannot make it work and prefer to bloat the software, then implementing the needed features in the tools becomes useless, because noone will use them.

Also you should try incremental compilation as that would improve compile times after small changes.

Thanks for the info, I should read about it.

Storyyeller commented 1 year ago

I have tried disassembly of the Scala code relevant to me with Python and Rust versions, both versions worked without an error, but the Rust version has emitted prettier code, unfortunately we cannot directly compare them with diffs.

Rust version works significantly faster, but not the orders of magnitude: 4.7s (optimized build, almost no difference to unoptimized one) vs 9.4s (cpython 3.9). The jar is small enough to fit into FS cache: 1.5 MiB.

The Rust version is bloated. Rust itself is the language incapable of proper dynamic linking and code reuse, so it fetches and rebuilds all the deps. While building this package the most controversal deps it fetches are deps of zip crate, the ones it is likely are never used within jars: zstd, aes and so on.

I guess the next goals can be:

  1. making output of Python and Rust versions of Krakatau comparable.
  2. Allowing the Python decompiler to consume the zip archives created by the disassembler and ensuring that it outputs the same code when fed with the archives produced by Python and Rust versions
  3. optimize the single-threaded impl to make "fast" Rust work really orders of magnitude faster than slow interpreted cpython 3.9.
  4. maybe create a cffi API for the disassembler and integrate it into Python decompiler
  5. .stack_size(256 * 1024 * 1024) I own a machine from 2001 where a 256 MiB used to be the whole physical RAM. I have upgraded it to 512 MiB and used python version of Krakatau on it succesfully (from a graphic LXQt session, so as you can guess that quite some of the RAM was consumed by the GUI apps that are a part of LXQt).

Thanks for the feedback! I'll see if I can optimize it a bit after the holidays. Could you provide the jar you used for benchmarking? I would expect a much larger speedup for optimized builds than that.

KOLANICH commented 1 year ago
mkdir -p ./destdir
wget -O ./destdir/kait.deb https://dl.cloudsmith.io/public/kaitai/debian-unstable/deb/any-distro/pool/any-version/main/k/ka/kaitai-struct-compiler_0.10-SNAPSHOT20220813.105458.a4435936/kaitai-struct-compiler_0.10-SNAPSHOT20220813.105458.a4435936_all.deb
ar x --output=./destdir ./destdir/kait.deb ./data.tar.gz
tar -zxv --directory ./destdir -f ./destdir/data.tar.gz ./usr/share/kaitai-struct-compiler/lib/io.kaitai.kaitai-struct-compiler-0.10-SNAPSHOT20220813.105458.a4435936.jar
mv ./destdir/usr/share/kaitai-struct-compiler/lib/io.kaitai.kaitai-struct-compiler-0.10-SNAPSHOT20220813.105458.a4435936.jar ./kait.jar
rm -rf ./destdir
KOLANICH commented 1 year ago

I'm sorry, it was my fault, I have mistakingly called debug version insead of release one (it resided in a different dir). The release one is really faster: 0.6s.

Storyyeller commented 1 year ago

@KOLANICH I updated it to remove unnecessary zip dependencies, reducing the binary size from 8.5mb to 7.3mb. I also tried to optimize the disassembler. However, it is already so fast that it was difficult to even benchmark or profile, and it looks like a lot of the remaining time is just spent on IO, which is unavoidable, so there doesn't seem to be much potential for further speedups here.

Storyyeller commented 1 year ago

Allowing the Python decompiler to consume the zip archives created by the disassembler and ensuring that it outputs the same code when fed with the archives produced by Python and Rust versions

One other note - I did do extensive testing before release making sure that Py disassembler -> Py assembler, Py disassembler -> Rust assembler, Rust disassembler -> Py assembler, Rust disassembler -> Rust assembler, etc. all give compatible results where expected. In fact, the main reason I backported most of the new features to the Python version was to make this comparison easier.

moikeygraham commented 1 year ago

@Storyyeller just wanted to drop a comment here to say the v2 Rust assembler / dissembler is great! Noticed a large speed improvement. Have not ran into any issues thus far.

Are there any plans to port the decompiler also?

Storyyeller commented 1 year ago

@Storyyeller just wanted to drop a comment here to say the v2 Rust assembler / dissembler is great! Noticed a large speed improvement. Have not ran into any issues thus far.

Are there any plans to port the decompiler also?

Thanks! I hadn't attempted to rewrite the decompiler yet because it would be a lot of work and I worried that noone would use it anyway due to the lack of response on the assembler/disassembler.

KOLANICH commented 1 year ago

worried that noone would use it anyway

I usually use the decompiler component of Krakatau.

Janmm14 commented 1 year ago

The Krakatau decompiler is still the one resisting most obfuscation techniques and providing accurate results. Quiltflower and the other ones cannot compete when having obfuscated bytecode.

It is your decision whether you want to invest the time into this. Decompilation has never been a popular topic. Other decompilers focus on readability and good regular invokedynamic display, Krakatau focuses on correct and mostly runnable decompilability of even highly obfuscated source code at the cost of some readability. Combined with the need for all libraries and python, other decompilers (mostly in java) had it a lot easier.

XenoAmess commented 1 year ago

yes you can try eclipse instead for decompiler. decompiling is not that same question to deassemble...


From: Janmm14 @.> Sent: Wednesday, February 1, 2023 11:04:54 AM To: Storyyeller/Krakatau @.> Cc: XenoAmess @.>; Mention @.> Subject: Re: [Storyyeller/Krakatau] Interest in Krakatau 2? (Issue #185)

The Krakatau decompiler is still the one resisting most obfuscation techniques and providing accurate results. Quiltflower and the other ones cannot compete when having obfuscated bytecode.

It is your decision whether you want to invest the time into this. Decompilation has never been a popular topic. Other decompilers focus on readability and good regular invokedynamic display, Krakatau focuses on correct and mostly runnable decompilability of even highly obfuscated source code at the cost of some readability. Combined with the need for all libraries and python, other decompilers (mostly in java) had it a lot easier.

— Reply to this email directly, view it on GitHubhttps://github.com/Storyyeller/Krakatau/issues/185#issuecomment-1411391315, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFFR2LPN67DIBDHHOGJERLWVHHFNANCNFSM5UFGUI6A. You are receiving this because you were mentioned.Message ID: @.***>

moikeygraham commented 1 year ago

The Krakatau decompiler is still the one resisting most obfuscation techniques and providing accurate results. Quiltflower and the other ones cannot compete when having obfuscated bytecode.

@KOLANICH +1 to this. Its far superior to anything else out there in my opinion. I will always favour correctness over readability.

Thanks! I hadn't attempted to rewrite the decompiler yet because it would be a lot of work and I worried that noone would use it anyway due to the lack of response on the assembler/disassembler.

@Storyyeller I tend to use all three equally and often!

Storyyeller commented 1 year ago

Thanks for the responses everyone. It might be a while before I have the time, but I'll look into working on the decompiler.

Storyyeller commented 1 year ago

FYI, I started working on the decompiler last week. However, it will take a long time to get anywhere, so don't get your hopes up.

moikeygraham commented 1 year ago

FYI, I started working on the decompiler last week. However, it will take a long time to get anywhere, so don't get your hopes up.

Amazing news 😀 let me know if I can help in any way (my Rust experience is limited, but happy to do some testing / QA)

XenoAmess commented 1 year ago

Hi.

@Storyyeller

now it can only do krak2 --roundtrip --out 1 2

but when 1 we can put out first, like -out -roundtrip

well is this by means to do so?

really don't think there should be such order limit for -- params...

XenoAmess commented 1 year ago

Like I said before I'm really interested in krak2 So now we have something like this: https://plugins.jetbrains.com/plugin/18144-bytecode-editor-xenoamess-tpm-/versions/stable/292491 If there be people who using jb-idea and wanna play with krak2 somehow it might be the best solution at current timestamp. (only windows supported currently because of no money for buying mac. If krak2 repo can provide compiled bins as release, would be appreciated.)

Storyyeller commented 1 year ago

Update: My previous comment on Feb 12 was way too optimistic. I've been busy and haven't gotten the chance to work on the decompiler at all lately.

Today, I finally found the time to work on Krakatau again, but not on the decompiler. I improved the error messages for the assembler and disassembler (#194).

Storyyeller commented 1 year ago

Updated Krakatau v2 to handle the fake directory attack (https://github.com/x4e/fakedirectory). v1 is still affected.

Janmm14 commented 1 year ago

Nice! (although it won't affect my workflow)

Storyyeller commented 1 year ago

Updated Krakatau v2 to ignore CRC checksums in jar files.

XenoAmess commented 1 year ago

@Storyyeller Hi. As it seems a little stable now, could you please set up a online cicd workflow and release some windows&linux&mac binary for krakatau2? Thanks.

Storyyeller commented 1 year ago

I'm not sure how to do that.

Storyyeller commented 1 year ago

Update: I haven't worked on the decompiler at all since early February. I still intend to rewrite it eventually, but I don't know if or when I'll be able to start making progress on it.

tonyspumoni commented 1 year ago

Definitely interested in the v2 decompiler. Happy to jump in and help where I can, too

Storyyeller commented 1 year ago

The basic problem is that I didn't want the v2 decompiler to just be a straight rewrite of v1. I wanted to come up with a better structuring algorithm in order to handle try blocks with multiple catch arms, which the v1 algorithm can't handle. But then I got stuck trying to come up with a new algorithm and eventually gave up.

Kreijstal commented 2 months ago

rip