NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.12k stars 5.65k forks source link

sleigh: incorrect "ERROR: Constructors with identical patterns" #6676

Open shuffle2 opened 6 days ago

shuffle2 commented 6 days ago

Describe the bug Consider the following (the exact values / registers do not relate to the architecture spec, just a minimized repro showing the bug in sleigh compiler):

define register offset=0x100 size=4 [ INT_TRIGGER TECR1 PFMC1 ];

define token instr32(32) endian=big
    SrMaj=(17,19)
    SrMin=(13,16)
;

sridx: INT_TRIGGER  is INT_TRIGGER  & SrMaj=0b001 & SrMin=0b1001 { export INT_TRIGGER; }
sridx: TECR1        is TECR1        & SrMaj=0b010 & SrMin=0b1110 { export TECR1; }
sridx: PFMC1        is PFMC1        & SrMaj=0b100 & SrMin=0b0000 { export PFMC1; }

:MFSR u20_24, sridx                 is $(I32) & $(MISC) & u20_24 & sridx & u5_9i=0 & subop=0b00010 { u20_24 = sridx; }
:MFSR u20_24, (SrMaj, SrMin, SrExt) is $(I32) & $(MISC) & u20_24 & SrMaj & SrMin & SrExt & u5_9i=0 & subop=0b00010 unimpl

The above will give error like

ERROR AndeStar.sinc:435: Constructors with identical patterns:
   table "instruction" constructor from AndeStar.sinc:435
   table "instruction" constructor from AndeStar.sinc:434 (SleighCompile)
ERROR AndeStar.sinc:434: Constructors with identical patterns:
   table "instruction" constructor from AndeStar.sinc:435
   table "instruction" constructor from AndeStar.sinc:434 (SleighCompile)
ERROR No output produced (SleighCompile)

Commenting one of the sridx lines, or clearing one of the set bits in their patterns, will result in successful compilation.

This is incorrect behavior for at least 2 reasons:

  1. The fact that, e.g. SrMaj patterns ORd together result in all-ones does not entail that all values of SrMaj are pattern-matched.
  2. It ignores other fields - such as SrMin and SrExt in this case, which also have not been exhaustively pattern-matched.

The idea here is that MFSR u20_24, sridx will be used for any register encoding where sridx is defined. Any unknown register encoding will result in different operand display and unimpl semantics. This technique works fine in practice until you happen to create tables with the same name which , when their patterns are OR'd together, result in the OR result being all-ones (i.e. if you OR all SrMaj patterns and SrMin patterns, you'll get 0b111 and 0b1111, respectively). Notably the error only occurs when 3 or more tables result in the OR condition being met. Additionally, the problem only occurs if the subtable is referenced from 2 or more constructors.

Expected behavior It should be possible to create such "fallthrough" instruction constructors.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

emteere commented 6 days ago

Not digging into the potential bug here, and I understand this is an example, the following is a comment on what I understand you are implementing.

I suspect if you refactor your implementation so that there is only one constructor and the second constructors operand is pushed into the sub-constructor, this will work for what you are trying to do.

define pcodeop UnknownSFR;

sridx: INT_TRIGGER                  is INT_TRIGGER  & SrMaj=0b001 & SrMin=0b1001 { export INT_TRIGGER; }
sridx: TECR1                              is TECR1        & SrMaj=0b010 & SrMin=0b1110 { export TECR1; }
sridx: PFMC1                              is PFMC1        & SrMaj=0b100 & SrMin=0b0000 { export PFMC1; }
sridx: (SrMaj, SrMin, SrExt)        is SrMaj & SrMin & SrExt { unkval:2 = UnkownSFR(SrMaj:1, SrMin:1, SRExt:1); }

:MFSR u20_24, sridx                 is $(I32) & $(MISC) & u20_24 & sridx & u5_9i=0 & subop=0b00010 { u20_24 = sridx; }

If an unknown maj,min,ext location is used in an MFSR instruction, you will see it in the decompilation.

An aside note, we're also looking at better supporting implementing special register spaces, control registers / coprocessor registers read/write, as separate memory spaces. This has the advantage that as you RE a device that uses unknown registers, you can name, comment, and see references to the registers in the space. I believe there will be some refactoring on existing processors to utilize this idea. This is similar to how devices that memory map registers into ram locations currently operate.

This also has the advantage that for processor variants that really only vary in the special registers you can either create a plugin to apply the register names for the variant, or create a .pspec for the variant without changing the underlying sleigh file.

You can do this now, but there are few changes that need to be made to better support this idea.

shuffle2 commented 6 days ago

Thanks! something like that does work:

sridx: (SrMaj, SrMin, SrExt) is SrMaj & SrMin & SrExt { dummy:4 = Unknown_SR(SrMaj:1, SrMin:1, SrExt:1); export dummy; }
:MFSR u20_24, sridx is u20_24 & sridx & u5_9i=0 & subop=0b00010 { u20_24 = sridx; }
:MTSR u20_24, sridx is u20_24 & sridx & u5_9i=0 & subop=0b00011 { sridx = u20_24; }

(adding export). I had thought exporting a locally-created symbol (which is written after export) would be sketchy, but seems to work well enough.

I'll let someone on the sleigh dev team close or take this issue as they see fit.

emteere commented 5 days ago

Oops forgot the export. Glad it worked for you.