NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.68k stars 5.88k forks source link

Add support for tainted near/far pointers #199

Open Godzil opened 5 years ago

Godzil commented 5 years ago

Is your feature request related to a problem? Please describe. On system like x86 Real mode, to be able to work around the segmentation, C compiler used tainted pointer with unofficial C keyword far and near to match the notion of pointer the x86 have. When disassembling existing code, it is impossible to tell the size of the pointer, and ghidra only assume that a void * is a 16bit (so near) pointer. When importing headers, the far and near keywords are ignored.

Describe the solution you'd like It would be nice to have proper support for the tainted keyword far and near so that Ghidra knows that a pointer is 16bit (just offset) or 32bit (Segment + offset). Also importing header in the context of a real mode x86 CPU should not ignore the far and near keyword.

Describe alternatives you've considered Cheating by using ulong instead of a void far * is not really a good way to go, because we are loosing the fact it is a pointer and not just a number.

Also it have to remember that a far pointer is not a linear address, but the a Segment:Offset pointer

Additional context N/A

ghidra1 commented 5 years ago

We currently lack the ability to retain pointer attributes other than size and referenced datatype. The notion of near/far seems a reasonable generic pointer concept which could be applied in the future.

Within the listing you can utilize an explicitly sized pointer (e.g., pointer32) which can be used as a far pointer. When applied on pointers stored in memory it will interpret the bytes properly and may be handled within the decompiler (I'm not certain but I would hope so).

Regarding header parsing: Since the mapping of "far pointer" to a 32-bit pointer is sensitive to the target compiler spec and without the ability to retain pointer metadata, a header C-Parse would need to specifically target the x86-16 to map far pointers to pointer32.

Godzil commented 5 years ago

Well, yes, pointer32 allow to use the proper size, but if the far pointer is to a known structure we lost all of the information about it, that really don't help to reverse code.

And the more complex the structure the more difficult the task is. To take a concrete example, such a structure is really difficult to follow without having the proper type:

struct _FsIL {
    IL super;
    fent_t far *(far *_entries)(FS fs);
    int (far *_n_entries)(FS fs);
    int (far *_getent)(FS fs, int n, fent_t far *fep);
    int (far *_findent)(FS fs, char far *fname, fent_t far *fep);
    void far *(far *_mmap)(FS fs, char far *fname);
    int (far *_open)(FS fs, char far *fname, int mode, int perms);
    int (far *_close)(int fd);
    int (far *_read)(int fd, char far *buf, int len);
    int (far *_write)(int fd, char far *buf, int len);
    long (far *_lseek)(int fd, long offset, int origin);
    int (far *_chmod)(FS fs, char far *fname, int mode);
    int (far *_freeze)(FS fs, char far *fname);
    int (far *_melt)(FS fs, char far *fname);
    int (far *_creat)(FS fs, fent_t far *fep);
    int (far *_unlink)(FS fs, char far *fname);
    int (far *_newfs)(FS fs);
    int (far *_defrag)(FS fs);
    unsigned long (far *_space)(FS fs);
};
GregoryMorse commented 5 years ago

I would point out that there are different models as well for the compiler and functions.

There is: near and far code, near and far data. This yields 4 possible combinations. Since function pointers and data pointers are both possible and such, all of this needs to be taken into account. A lot of bugs have been fixed in the backend (see the PRs) but the frontend is still lacking the ability to take advantage of the fixes - as having proper type information is crucial for good output.

claunia commented 4 years ago

Well this basically made it quite useless for 16-bit OS/2 where by DEFAULT a pointer is a far pointer, but ghidra marks all pointers as 16-bit without any way I can find of changing the default decompiler pointer size to 32-bit.

Godzil commented 4 years ago

Well a far pointer is not really a 32bit pointer either as they are linear pointer and not segment pointer. They need to add proper support for segmentation and the pointer related, meaning they need to support the fact that pointer may need to store more info than just the size.

Though as the assembly code the opcode for a far call or a near call is not the same so if the disassembler show a near pointer, that mean that's probably what the code is doing. Whatevr is the "default" on a operating system. The OS does not change the application code. The "default" value for compilation is, I think, useless for the disassembler/decompiler, as these info are to tell the compiler what type of pointer to use by default in the generated code, Ghidra use the generated code to derive the pointer size.

Wall-AF commented 3 years ago

Is this on anyone's radar?