NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.25k stars 5.84k forks source link

Conflicting generic_clib compiler specific typedefs #1116

Open astrelsky opened 5 years ago

astrelsky commented 5 years ago

Describe the bug The following typedef datatype in the distributed generic_clib archive are causing size conflict with cspecs which have a pointer size of 4 and a long size of 8.

I'm sure there may be others but these are the two I know of off the top of my head.

Expected behavior They should be a typedef int so that they are the same size as the default pointer size.

Additional context As a result of the size difference the generic_clib is virtually unusable with cspecs that have a long size of 8 and pointer size of 4. Any struct which uses ptrdiff_t or size_t results in being incorrect and having the wrong size. If they can be changed to an int before the next ghidra release it would be greatly appreciated.

0x6d696368 commented 5 years ago

This is related to #257.

It seems the generic_clib data types don't work for everything. A fix may require multiple data type archives for different C libraries.

Regarding size_t issue:

size_t is defined by the C standard to be the unsigned integer return type of the sizeof operator (C99 6.3.5.4.4), and the argument of malloc and friends (C99 7.20.3.3 etc). The actual range is set such that the maximum (SIZE_MAX) is at least 65535 (C99 7.18.3.2).

(from https://stackoverflow.com/questions/918787/whats-sizeofsize-t-on-32-bit-vs-the-various-64-bit-data-models)

Expected behavior They should be a typedef int so that they are the same size as the default pointer size.

Unfortunately, that is not true. Depending on the data model used (see https://en.wikipedia.org/wiki/64-bit#64-bit_data_models) int can be 32 or 64-bit on 64-bit systems. According to the C99 spec size_t should be unsigned integer type... it doesn't define which unsigned type. :(

@astrelsky: What binary are you looking at? What is its data model?

Regarding ptrdiff_t:

This should be the address size ... however, even there C only defines it as the type of the result of subtracting two pointers. So even that could be 64-bit on 32-bit systems. :(

Not sure how to fix that besides have C library specific data type archives.

astrelsky commented 5 years ago

I am using this In the generic-clib, both ptrdiff_t and size_t are typedef long. Which for this cspec is 8 bytes wide but the address size is 4 bytes. The issue is the 32-bit clib not the 64 bit one.

If setting them to an unsigned/signed integer would cause an issue then maybe they should be builtin or typedef to a built in so their size it set accordingly dependent on the data organization.

0x6d696368 commented 5 years ago

I am using this In the generic-clib, both ptrdiff_t and size_t are typedef long. Which for this cspec is 8 bytes wide but the address size is 4 bytes. The issue is the 32-bit clib not the 64 bit one.

If setting them to an unsigned/signed integer would cause an issue then maybe they should be builtin or typedef to a built in so their size it set accordingly dependent on the data organization.

Both typedef long unsigned int size_t and typedef unsigned int size_t are correct. The C standard only requires it to be unsigned and integer and hold the result of sizeof(). Usually size_t is defined to long unsigned int.

Have you analyzed binaries with the cspec from https://github.com/beardypig/ghidra-emotionengine ? Do they have size_t data types as 8 or 4 bytes? I assume 4 bytes, hence the requested change. Edit: Just saw your Additional context and I take it the binaries you analyze use 4 bytes for size_t and ptrdiff_t.

Not sure what causes less issues, defining size_t to unsigned int or long unsigned int. But given that it currently is long unsigned int I don't think it should be changed without making sure it doesn't break existing behavior (which people may rely on).

Maybe a workaround is to use a dedicated clib.gdt used with the cspec from from https://github.com/beardypig/ghidra-emotionengine

astrelsky commented 5 years ago

One of the preprocessor definitions for the ps2dev ps2sdk is #define __SIZE_TYPE__ unsigned int I can still attach a binary if you'd like but from the definition it will be an unsigned int.

I thought it was defined this was in gcc but every definition appears to be long in regardless. This is the definition in vcruntime.h. I may have gotten the two of them confused. If so then I'll close this and just add size_t to #664

vcruntime.h's definition

// Definitions of common types
#ifdef _WIN64
    typedef unsigned __int64 size_t;
    typedef __int64          ptrdiff_t;
    typedef __int64          intptr_t;
#else
    typedef unsigned int     size_t;
    typedef int              ptrdiff_t;
    typedef int              intptr_t;
#endif
astrelsky commented 5 years ago

@0x6d696368 is it actually possible for a size_t on an architecture with a 4 byte pointer size to need to be an integer larger than 4 bytes? I don't think it is possible to have something with a size larger that the max address size. If it isn't is there really an issue with having it changed from long unsigned int to unsigned int? The only way I can think of the change causing an issue would be if the size of an int was larger than the size of a long, which would be absurd.

As for ptrdiff_t I have been manually creating a different ptrdiff_t for my C++ Class Analyzer. The specific code for fetching the correct integer is here I have yet to encounter an architecture where the incorrect ptrdiff_t size has been returned (no problems with vtable structs). Tested architectures are the ones which have a makefile here

0x6d696368 commented 5 years ago

@0x6d696368 is it actually possible for a size_t on an architecture with a 4 byte pointer size to need to be an integer larger than 4 bytes? I don't think it is possible to have something with a size larger that the max address size. If it isn't is there really an issue with having it changed from long unsigned int to unsigned int? The only way I can think of the change causing an issue would be if the size of an int was larger than the size of a long, which would be absurd.

It could cause problems when sizeof(int)=2 and sizeof(long)=4. Thinking non x86 processors here. It would be interesting to know from what C library the generic data type archive was generated. Also how does Ghidra decides which archives to autoload into programs?

As for ptrdiff_t I have been manually creating a different ptrdiff_t for my C++ Class Analyzer. The specific code for fetching the correct integer is here I have yet to encounter an architecture where the incorrect ptrdiff_t size has been returned (no problems with vtable structs). Tested architectures are the ones which have a makefile here

Yes, GCC defines sizeof(ptrdiff_t) = sizeof(size_t). But a C library can define size_t and ptrdiff_t however it likes. And many embedded libraries have it as long. Clearly having them the same size as the address width makes sense, but isn't required by the C standard. So it would be really interesting to know from which C library the generic archives were generated.


Edit: I checked some libraries for size_t:

So it should be OK to just change it to int ... but it would still be interesting to know why it currently is long.