NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.12k stars 5.82k forks source link

Not tracking function parameter names and types #651

Open teaalltr opened 5 years ago

teaalltr commented 5 years ago

Describe the bug In the decompiler, when importing a PDB file (with private debug symbols), I can make it rename the function accordingly (using the edit from issue 94), but it doesn't change the types of the arguments and their name.

When importing an XML from IDA idb using the included python plugin, the same problem shows up, in that it doesn't rename the parameters and types, only function names.

The problem shows up both with exes and PEF executables

emteere commented 5 years ago

Are the names mangled? Did you run analysis after applying the PDB information? Is the function signature defined in the DataType archive for windows? Is this occuring on all parameters?

Can you provide a snapshot of a symbol where this is occuring?

The PDB import is currently being re-written to extract more information.

teaalltr commented 5 years ago

Yes, names are all mangled. I analyzed an exe and a PEF (classic Mac executable). The issue with the PDB is of course only present with the exe.

I did run the analysis both during initial import of file and after. The function signatures of the PDB are not defined in the DataType Archive. Yes, it occurs on all params.

I'm having some problems running Ghidra after the last Windows update, 1903, it runs but it can't find the decompiler.exe. Maybe Win resets/deletes the PATH or something like that, I will see. Will provide a snapshot asap.

You mean you are rewriting a tool in java, independent from Microsoft's? Could you point me to the files being rewritten? Thanks

UPDATE: the issue with Win 1903 turns out to be a problem on my part, not Win or Ghidra fault; it was me choosing the 32 bit executable of Java, so no matter at all, everything working now.

teaalltr commented 5 years ago

@emteere Here a sample function (what I see from decompiler window)

undefined4 __thiscall
TShape::shape_shadow_unclipped(TDrawArea_*,long,long,uchar_*)
          (int iParm1,int param_1,int param_2,uint param_3,int param_4)

{
    byte bVar1;
    int iVar2;
    bool bVar3;
    int iVar4;
    byte *pbVar5;
    byte *pbVar6;
    byte *pbVar7;
    byte *pbVar8;
    int iVar9;
    dword width;

    iVar2 = *(int *)(iParm1 + 12);
    pbVar8 = (byte *)(iVar2 + 24);
    iVar9 = (*(int *)(iVar2 + 20) - *(int *)(iVar2 + 12)) + 1;
    iVar2 = *(int *)(param_1 + 36);
    iVar4 = TDrawArea::AlignedWidth(void)();
    iVar4 *= iVar2;
    if (iVar2 < 1) {
        param_3 = (param_3 - *(int *)(param_1 + 28)) + 1;
    }
    pbVar6 = (byte *)(param_3 * iVar4 + *(int *)(param_1 + 8) + param_2);
    param_1 = iVar9;
    if (0 < iVar9) {
        do {
            bVar3 = false;
            pbVar5 = pbVar6;
            pbVar7 = pbVar8;
            do {
                bVar1 = *pbVar7;
                pbVar8 = pbVar7 + 1;
                if ((bVar1 & 1) == 1) {
                    param_3 = (uint)(bVar1 >> 1);
                    if (bVar1 >> 1 == 0) {
                        pbVar5 = pbVar5 + (uint)*pbVar8;
LAB_004b9ac2:
                        pbVar8 = pbVar7 + 2;
                    }
                    else {
                        pbVar7 = pbVar5 + param_3;
                        while (pbVar5 < pbVar7) {
                            *pbVar5 = *(byte *)((uint)*pbVar5 + param_4);
                            pbVar5 = pbVar5 + 1;
                        }
                        pbVar8 = pbVar8 + param_3;
                    }
                }
                else {
                    param_3 = (uint)(bVar1 >> 1);
                    if (bVar1 >> 1 != 0) {
                        pbVar8 = pbVar5 + param_3;
                        while (pbVar5 < pbVar8) {
                            *pbVar5 = *(byte *)((uint)*pbVar5 + param_4);
                            pbVar5 = pbVar5 + 1;
                        }
                        goto LAB_004b9ac2;
                    }
                    bVar3 = true;
                }
                pbVar7 = pbVar8;
            } while (!bVar3);
            pbVar6 = pbVar6 + iVar4;
            param_2 = param_1 + -1;
            param_1 = param_2;
        } while (param_2 != 0);
    }
    return CONCAT31((int3)((uint)param_2 >> 8),1);
}

When exported to C file, the name becomes TShape__shape_shadow_unclipped_TDrawArea___long_long_uchar___

This is after applying XML exported symbols from IDA (6.8) database. The PDB was imported in IDA, not in Ghidra. As you can see, it correctly names the function, but it appends the type information to the name of the function, not committing it to actual parameters names and types. Ghidra however correctly imports/recognizes (not all but) lots of type names from the XML - as I can see them in the datatype window.

Here's what IDA does and what I'd expect (note the function params, all committed and propagated throughout the function body)

char __thiscall TShape::shape_shadow_unclipped(TShape *this, TDrawArea *drawarea, int shape_x0, int shape_y0, char *color_table)
{
  Shape_Header *v5; // eax@1
  unsigned __int8 *v6; // ebp@1
  int v7; // esi@1
  int v8; // ebx@1
  int v9; // eax@1
  int v10; // ecx@2
  char *v11; // ecx@4
  char v12; // bl@6
  unsigned __int8 v13; // dl@7
  unsigned int j; // edx@10
  int v15; // eax@11
  unsigned int i; // edx@15
  int v17; // eax@16
  bool v18; // zf@19
  int width; // [sp+10h] [bp-4h]@1
  TDrawArea *drawareaa; // [sp+18h] [bp+4h]@5
  int shape_x0a; // [sp+1Ch] [bp+8h]@4
  unsigned __int8 save_buffer; // [sp+20h] [bp+Ch]@8

  v5 = this->shape_header;
  v6 = (unsigned __int8 *)&v5[1];
  v7 = v5->ymax - v5->ymin + 1;
  v8 = drawarea->Orien;
  v9 = v8 * TDrawArea::AlignedWidth(drawarea);
  width = v9;
  if ( v8 >= 1 )
    v10 = shape_y0 * v9;
  else
    v10 = v9 * (shape_y0 - drawarea->Height + 1);
  v11 = &drawarea->Bits[v10] + shape_x0;
  shape_x0a = (int)v11;
  if ( v7 > 0 )
  {
    drawareaa = (TDrawArea *)v7;
    do
    {
      v12 = 0;
      do
      {
        v13 = *v6++;
        if ( (v13 & 1) != 1 )
        {
          if ( !(v13 >> 1) )
          {
            v12 = 1;
            continue;
          }
          for ( i = (unsigned int)&v11[v13 >> 1]; (unsigned int)v11 < i; *(v11 - 1) = color_table[v17] )
            v17 = (unsigned __int8)*v11++;
          goto LABEL_17;
        }
        save_buffer = v13 >> 1;
        if ( !(v13 >> 1) )
        {
          v11 += *v6;
LABEL_17:
          ++v6;
          continue;
        }
        for ( j = (unsigned int)&v11[save_buffer]; (unsigned int)v11 < j; *(v11 - 1) = color_table[v15] )
          v15 = (unsigned __int8)*v11++;
        v6 += save_buffer;
      }
      while ( !v12 );
      v11 = (char *)(width + shape_x0a);
      v18 = drawareaa == (TDrawArea *)1;
      shape_x0a += width;
      drawareaa = (TDrawArea *)((char *)drawareaa - 1);
    }
    while ( !v18 );
  }
  return 1;
}
teaalltr commented 5 years ago

The PDB import is currently being re-written to extract more information. I just checked the XML importer:

def import_typedef(self, type_def):
# import_typedef: NOT IMPLEMENTED
if self.options.DataTypes.checked == False:
return
self.update_counter(TYPE_DEF)

so types are not currently imported properly?

teaalltr commented 4 years ago

Update: still present in 9.2 dev trunk

teaalltr commented 1 week ago

@emteere Still present in version 11.2. Is it being worked on?

emteere commented 1 week ago

There have been quite a few changes that would affect the current state of the issue you originally reported. For example the PDB analysis is totally re-written. It probably still has issues with local variables. Is it exactly the same as above, or have some things gotten better?

Can you find your example and update it to what Ghidra 11.2 does?

emteere commented 1 week ago

Also do you have a simple case showing the issues you are experiencing?

chf0x commented 2 days ago

Hey everyone, I have a simple example to share. In IDA, function arguments are recovered without any issues, but in Ghidra, I only see generic names like param1, param2, etc. TestApp.zip