Esshahn / utterances-comments

comment repo for utterances
0 stars 0 forks source link

blog/pydisass/ #3

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

awsm - PyDisAss: A 6502 disassembler in Python

retro computing with a flavor of nerdiness

https://www.awsm.de/blog/pydisass/

Michel-FK commented 3 years ago

http://archive.6502.org/publications/dr_dobbs_journal_selected_articles/6502_disassembler_from_apple.pdf

By Steve Wozniak & Allen Baum:

This subroutine package is used to display single or sequential 6502 instructions in mnemonic form. The subroutines are tailored to disassemblers and debugging aids, but tables with more general usage (assemblers) are included. The subroutines occupy one page (256 bytes) and tables most of another. Seven page zero locations are used."

kmonson commented 3 years ago

I think a simpler approach may be to recursively trace through the instructions flagging reached bytes as code until all code paths have been exhausted. Everything flagged as code is decoded and everything else is considered data.

The only two things I can come up with that thwarts this method is a conditional jump that always jumps placed right before data and self modifying code that self modifies to include a jump to a previously unreachable location.

Esshahn commented 3 years ago

Good suggestion, which is partly done with the latest approach and could be extended further.

semiversus commented 3 years ago

How would you handle jmp (addr) instructions? At this point you can jump possibly everywhere...

kmonson commented 3 years ago

How would you handle jmp (addr) instructions? At this point you can jump possibly everywhere...

That is a good point, however I don't think 6502 directly supports jumps to an address determined at runtime unless that code is self modifying. This is no instruction to jump to X or Y, just a baked in address or relative location. You can safely (99.99% of the time) assume that the target of a jump instruction is more code.

So in answer to your question you would just follow the jump instruction to where ever it takes you and continue flagging bytes as code from there. If it jumps past the end of the program data then you can consider that a dead end and that the appropriate code at that address will be generated at runtime so there is really nothing to decompile anyway.

That is not to say that my suggested method is perfect. I did mention a few cases where it falls down. (With self modifying code I'm not convinced a perfect decompiler is possible.)

semiversus commented 3 years ago

It's used and it's not self modifying code! See Super Mario Bros. for NES as example: https://gist.github.com/1wErt3r/4048722#file-smbdis-asm-L2383 . It is used for jump tables in a clever way. I worked on JIT compiling for a 6502 emulator and those indirect jumps stopped my run (actually even more trivial 6502 announces...). By the way: You're doing cool stuff ;-)

Esshahn commented 3 years ago

Cool, thanks for sharing semiversus :)

kmonson commented 3 years ago

I stand corrected! Thank you for pointing that out. I misunderstood that reading through the 6502 instructions to try to answer your first question.

Hmm... My other thought for this problem was to essentially run the code in an emulator, flagging code bytes as you went. Of course there is the problem of not really knowing if you've hit every possible code path. I think that this may not be truly knowable unless there are no jmp commands or self modifying code. (Not to mention unintended code paths resulting from bugs. See any TAS with arbitrary code execution.)

Another thought is that once all code paths are exhausted using the recursive method above you could look for anything that looks like a jump back to known code bytes using an absolute jump and then work backwards from there. But then if you work backward you don't truly know where to stop unless you hit known code or invalid code and that still is no guarantee.

If the goal is still to decompile one specific thing then you'll probably figure out pretty quick what your decompiler does and does not need to work with.

OK OK OK... I need to stop thinking about this. :)

rnbastos commented 3 years ago

Hello! Here's your article in my blog, translated into Portuguese. https://www.krull.com.br/2021/05/01/um-desassemblador-6502-em-python/

Esshahn commented 3 years ago

Amazing achievement! Thank you for the translation to portuguese rnbastos!