NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.97k stars 5.9k forks source link

V850 BranchLessThen incorrect behavior #7190

Open esaulenka opened 6 days ago

esaulenka commented 6 days ago

Describe the bug

Decompiler produces a weird code for sequence SXB CMP BLT (sign extend byte, compare, branch if less than). Perhaps it was a simple loop with a int8_t counter.

To Reproduce

  1. Save attached code as .hex file
  2. Open it as a V850:LE:32:default (1.0)
  3. Decompile it.
  4. Inner loop has some strange condition, that is always false according the standard C rules.

Expected behavior

The inner loop is just a do { .... } while (iVar6 < 8); if I'm reading assembler correctly.

Assembler:

FUN_00000000
     mov        r6,ep
     sxh        r7
     ori        0xa5a5 ,r0,r10
     br         LAB_0000002e
LAB_0000000a                                    XREF[1]:     00000030 (j)   
     sld.bu     0x0 [ep],r1
     add        -0x1 ,r7
     mov        0x0 ,r14
     add        0x1 ,ep
LAB_00000012                                    XREF[1]:     0000002c (j)   
     andi       0x1 ,r10 ,r12
     sar        0x1 ,r10
     andi       0x1 ,r1,r13
     xor        r12 ,r13
     be         LAB_00000024
     xori       0x6bf7 ,r10 ,r10
LAB_00000024                                    XREF[1]:     0000001e (j)   
     add        0x1 ,r14
     sar        0x1 ,r1
     sxb        r14                     <<<
     cmp        0x8 ,r14                <<<
     blt        LAB_00000012            <<<
LAB_0000002e                                    XREF[1]:     00000008 (j)   
     cmp        r0,r7
     bgt        LAB_0000000a
     jmp        [lp]

Actual decompiled code:

uint FUN_00000000(byte *param_1,short param_2)
{
  uint uVar1;
  int iVar2;
  uint uVar3;
  uint uVar4;
  char cVar5;
  int iVar6;

  iVar2 = (int)param_2;
  uVar3 = 0xa5a5;
  while (0 < iVar2) {
    uVar1 = (uint)*param_1;
    iVar2 += -1;
    iVar6 = 0;
    param_1 = param_1 + 1;
    do {
      uVar4 = uVar3 & 1;
      uVar3 = (int)uVar3 >> 1;
      if ((uVar1 & 1) != uVar4) {
        uVar3 ^= 0x6bf7;
      }
      cVar5 = (char)iVar6 + '\x01';
      uVar1 = (int)uVar1 >> 1;
      iVar6 = (int)cVar5;
    } while (iVar6 + -8 < 0 != (cVar5 >> 7 < '\0' && -1 < iVar6 + -8));     ?? what is it?!
  }
  return uVar3;
}

I tried to find the bug, but it looks like both CMP and BLT instructions written as described in the spec.

Also I have checked other combinations, and sometimes it works a bit better: ADDI loopLength, loopCounter, 0; BLT makes do { ... } while (iVar1 < 0 != SCARRY(iVar3, loopLength) (see the second .hex sample)

Environment (please complete the following information):

Additional context

original code, save it as .hex file

:020000040000FA
:2000000006F0E7008056A5A5B51560085F3A007241F2CA660100A152C16E01002C69B205D3
:14002000AA56F76B4172A10AAE006872B6F5E039DFED7F0075
:00000001FF

And this one decompiled bit better

:020000040000FA
:20000000800721000032023A00720E08C20A3E06E058DEFEC1F10105C40941728107CF048B
:1200200079FF0E06FBFEA6F5A4078D0616FF40063F00D6
:00000001FF
GhidorahRex commented 5 days ago

I think the issue here is mainly the OV flag macro.

macro set_OV_neg(var1, var2)
{   
    local A:4 = var1;
    local B:4 = var2;
    local R = A - B;

    local A1 = A[31,1];
    local B1 = B[31,1];
    local R1 = R[31,1];

    $(OV) = (A1 != B1) && (B1 == R1);

    #OV = 1 if:
    #pos - neg = neg
    #neg - pos = pos
}

the local A1 = A[31,1]; leads to some messy decompilation. What it's really doing is testing for sign, so a better format is probably:

macro set_OV_neg(var1, var2)
{   
    local A:4 = var1;
    local B:4 = var2;
    local R = A - B;

    local A1 = A s< 0;
    local B1 = B s< 0;
    local R1 = R s< 0;

    $(OV) = (A1 != B1) && (B1 == R1);

    #OV = 1 if:
    #pos - neg = neg
    #neg - pos = pos
}
esaulenka commented 5 days ago

Thanks, that's a good point. Decompiled code become a bit more clear ... while (iVar5 + -8 < 0 != (iVar5 < 0 && -1 < iVar5 + -8)); but still dirty.

PS if you try to compile it now, this loop runs exactly 8 times, as expected.

LukeSerne commented 4 days ago

The condition (iVar5 + -8 < 0 != (iVar5 < 0 && -1 < iVar5 + -8)) is equivalent to (iVar5 < 8), as can be shown using the following derivation:

    (iVar5 + -8 < 0 != (iVar5 < 0 && -1 < iVar5 + -8))
= { add 8 to both sides of equations involving 'iVar5 + -8' }
    (iVar5 + -8 + 8 < 8 != (iVar5 < 0 && -1 + 8 < iVar5 + -8 + 8))
= { RuleCollapseConstants: -8 + 8 => 0, -1 + 8 == 7 }
    (iVar5 + 0 < 8 != (iVar5 < 0 && 7 < iVar5))
= { RuleRangeMeld: (7 < x && x < 0) leaves no values for x, so this is always false }
    (iVar5 < 8 != false)
= { RuleIdentityEl: (x + 0) -> x }
    (iVar5 < 8 != (iVar5 < 0 && 7 < iVar5))
= { RuleBooleanNegate: bool != false  =>  bool }
    (iVar5 < 8)

It would be nice if RuleRangeMeld could 'see through' the constant arithmetic, and also work for cases like (V s< 0) && (-1 s< V + -8). If that was the case, I think this expression would simplify nicely.