NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.59k stars 5.78k forks source link

6800 disassembler is missing #3673

Open sigurasg opened 2 years ago

sigurasg commented 2 years ago

The 6805 disassembler spec is pretty aneamic and has many details wrong, as I found out when I tried to reverse the ROMs from an old Tektronix scope that has the MC6808. Thankfully the 6809 spec is quite good, and I was able to write what seems to be a complete spec for the 6800 family from it, by removing the 6809 extensions and adding back the few 6800 instruction specs the 6809 isn't compatible with. I have a patch prepared in https://github.com/sigurasg/ghidra/tree/add_6800_specs. ATM my patch adds a new 6800 language without touching the old 6805 language. I'd be up for removing the old spec in favor of the new one if that's a can do.

emteere commented 2 years ago

I've been looking at another problem report with the 6809 which indicated some issues with the 6809 processor specification.

We usually try to be backwards compatible when replacing a processor if possible. Using the same processor ID in the ldefs file. It may be possible to replace the existing processor. If you can set the version number on your new processor to a version number that is ahead of the current 6809 processor version number it could be possible to replace it. All instructions in existing programs based on the current module would get re-disassembled when the new one is released.

If at all possible the same register names should be used, as well as ram space names. If there is a good reason to change them to some other name, you still may be able to change the names and replace the processor with your new one.

You can put in a PR request and we can take a look.

emteere commented 2 years ago

I took a look at the 6800, 6805, H6309, and 6809 instruction sets. There is also the 6802, 6808. I would be nice to get these all together and get rid of the duplication. The processor specs could turn into a spaghetti mess, or it might workout and the idiosyncrasies could be contained in separate files.

I've found some issues with the 6809 and some of the H6309 addressing modes (there are many H6309 missing instructions). I'll be putting some fixes and extensions into the 6809 and possibly some H6309 instructions as time permits.

I took a look at your implementation, and saw some of the same problems I saw in the 6x09, some to do with the H6309, but others affecting the branching. You can't CALL/JUMP to a unique varnode, you have to CALL [v], or jump [v]. I'm sure the 6805 has the same issues too, or worse. Also there was an immediate export of a token which in general shouldn't be done.

It could be possible the 6x09 processor could be modified to include support for all the previous 6800 variants. Which seems fairly op-code encoding compatible, although I didn't do a thorough evaluation.

I see collisions and alot of similarities in instructions and encoding. How many there are could make or break an integration refactoring. For example the CLC 6805 collides with EORA 6809. And CLC on 6800 is 0xC and 0x98 on 6805, and missing on 6809. So it could be a mess trying to bring them all together. But maybe there is enough duplication and exact decode match to make it worthwhile. The Ifdefs could spin out of control and not make the reuse worth it. I think it is but only careful comparison to support all of them will really tell.

sigurasg commented 2 years ago

Thanks for taking a look! I didn't realize until just now that the 6805 is so different from the 6800/01/02/08, I had intended to cover it with the same spec.

I guess it should be possible to cover 6800/6801/6802/6808 with this single spec: https://github.com/sigurasg/ghidra/blob/add_6800_specs/Ghidra/Processors/6805/data/languages/6800.sinc. I "wrote" this by copying the 6809 spec, changing the op code specs that don't match 6809 (0x00-0x20 IIRC) and then removing the addressing modes that 6809 has, that don't exist on the 6800. I then did some reversing with the result and tweaked the spec until the decompiler seemed to give reasonable results.

My interest is in reversing Tek 2465 oscilloscope firmware, which uses the 6808. I'd assumed 6800/6801/6802 and 6808 were identical in the instruction set and architecture, aside from built-in peripherals and RAM/ROM. IDK whether that's true, but at least 6800/6802 and 6808 are compatible according to this datasheet: http://www.andysarcade.de/data/electronics/components/6802_6808.pdf.

I'm a n00b to Ghidra and working on GitHub, so I need more explicit direction :).

The exported token you speak of, is it this: https://github.com/sigurasg/ghidra/blob/add_6800_specs/Ghidra/Processors/6805/data/languages/6805.slaspec#L47?

Also can you give me a case of CALL varnode and a quick example or pointer to how it should be done?

Is this perhaps easier if I create a pull request (I'm new here)?

I couldn't find a good way to mash my processor spec into a unit test, so I wrote a .bin file with all the valid instructions and all the invalid opcodes to test against. This is IMHO a PITA though, as I can only discover PCODE problems by manual inspection - is there a better way?

sigurasg commented 2 years ago

After reading through the/a 6805 user manual (http://bitsavers.trailing-edge.com/components/motorola/6805/6805_Users_Manual_2ed_1983.pdf), I wonder if it's a mistake to lump the 6800/01/02/08/09 together with the 6805. It looks to me that the 6805 is a distinct processor architecture and family from the 6800/6809. The 6805 has A & X which are both 8-bit registers, and the SP is only as large as the device memory allows - weirdness.

My suggestion would be to add both a 6800 and a 6809 directory, leaving the 6805 as-is (I don't have an interest in improving it myself). While there's clearly some commonality and evolution from the 6800 to the 6809, the two are very different and the Venn diagram of op codes and addressing modes that are common seems pretty small.

sigurasg commented 2 years ago

Appendix A of the 6805 user manual, starting at page 143 addresses 6800/6805 compatibility. It starts with: “Strictly speaking, the M68D5 HMOS/M1468D5 CMOS Family is neither source- nor object- code compatible with the MC68DD; but it is very similar to all M68DD Family processors. An experienced MC68DD programmer should have little difficulty adapting to the M68D5 HMOS/M1468D5 CMOS Family instruction set. The following paragraphs enumerate the difference between the MC68DD and the M68D5 HMOS/M1468D5 CMOS Family.”

It goes on to describe how it’s totally different in register architecture and how the instruction set is otherwise quite different.

emteere commented 2 years ago

I do think the 6805 would need to be a different processor from what you and I have read.

Neither processor is very large, so duplication wouldn't be as bad on other processors.

The 6800->6809 lineage does seem workable from what you have done. However if you can use @ifdef to isolate the variations and then configure them in the base 6800.slaspec. It may be that there are too many collisions to merge all the specs.

emteere commented 2 years ago

The 6805 seems OK with some of the issues I saw. In the 6x09

L240, export imm8; should be *export [const]:1 imm8;**

L808, goto OP2; should be goto [OP2];

L1104 call OP2; should be call [OP2];

You can only call or goto to an address varnode. Where the sub-constructor computed a real address and exported the address. Otherwise the value is put into a unique varnode. And if you goto the unique directly, it would be like jump/call directly to the location in the unique space. A varnode is triple of <space, offset, size>.

The 6805 doesn't appear to have the same issues as the 6x09. The JSR instruction in the 6805 uses JSR [ADDRI], where ADDRI would end up being an exported unique. It uses JSR ADDR correctly as well, because ADDR is a directly exported address varnode. If the ADDR subconstructor had assigned ADDR to a tmp and then exported that, the JSR ADDR would have needed to be JSR [ADDR].

sigurasg commented 2 years ago

Renamed the issue because the 6800 disassembler is a whole distinct language from the 6805 disassembler, as the two are neither source nor object code compatible.

emteere commented 2 years ago

I checked a few changes into our upcoming 10.1.2 patch, that should be here soon. It fixes the JSR/JUMP issues. I plan to:

It should pave the way for better supporting the other variants mentioned above. I didn't want to make the changes in the patch yet.

sigurasg commented 2 years ago

Sounds good, I'll wait until the 10.1.2 is published, update my patch and generate a pull request at that point?

On Mon, Jan 24, 2022 at 12:54 PM emteere @.***> wrote:

I checked a few changes into our upcoming 10.1.2 patch, that should be here soon. It fixes the JSR/JUMP issues. I plan to:

  • refactor the directory to be MC6800
  • split up the 6805.ldefs into 6805/6800
  • add the H6309, since the addressing modes and a start on the instructions are there for it. I've seen some binaries that just use the extended addressing modes, so the unfinished H6309 variant of the MC6809 could be useful to someone.

It should pave the way for better supporting the other variants mentioned above. I didn't want to make the changes in the patch yet.

— Reply to this email directly, view it on GitHub https://github.com/NationalSecurityAgency/ghidra/issues/3673#issuecomment-1020378177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDKF4MEQTWLWUXJR6JKV3UXWG6HANCNFSM5IXA67YQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

sigurasg commented 2 years ago

I've moved the 6800 spec to MC6800, cleaned up some and created a pull request: #4043. Let me know if I'm holding it wrong - I'm new around here :).

sigurasg commented 2 years ago

I don't think closing this issue with that patch is right, as the MC6800 processor is still not covered. As discussed above, the MC6800 is not object compatible with the 6809.

emteere commented 2 years ago

The changes that were merged for the 6809 was an interim step to clean up the 6809, and possibly allow the 6800 and 6809 to merge. I'm hoping there is a merging of what you have done in your PR which identifies the differences in the 6809 and the 6800. I suspect you even found issues that are existing problems in the 6809.

It is possibly they can be merged with a few ifdefs. In the end there may be too many differences to make that possible, but it has been done with other processors.

So we'll leave this open until the dust settles.

sigurasg commented 2 years ago

I created a new (flattened) PR in NationalSecurityAgency/ghidra/pull/4055 per this comment https://github.com/NationalSecurityAgency/ghidra/pull/4043#issuecomment-1059479182 in the previous PR. Please take a look.