Open teaalltr opened 4 years ago
This may not exactly be helpful, but at the moment I find it much more reliable and easier to pull the datatypes from the debug libraries and create datatype archives from the extracted dwarf information. Then again, this requires having access to the debug versions of the libraries and it can only get the types that are used by the library itself.
If you just choose just the directory that has your set of header files to be imported, then an algorithm which attempts to select the root header files in the correct order is added to the set of files to parse.
Really it isn't much different than trying to find the correct order for include files to parse when writing code.
I agree with both comments, they are a pain to deal with, and if you have good debug information that can be a better source of data types information.
Parsing header files attempts to discover values of defines from macros and add them as enums which you lose from debug information. There are some changes planned to better recover from multiple definitions. Currently only the last definition for a define is kept.
An additional change to pull unknown datatypes from any open archive at the time of parse could help as well. If you were to compile code that included a header file the correct pre-include files would be necessary as well.
If you just choose just the directory that has your set of header files to be imported, then an algorithm which attempts to select the root header files in the correct order is added to the set of files to parse.
Really it isn't much different than trying to find the correct order for include files to parse when writing code.
I agree with both comments, they are a pain to deal with, and if you have good debug information that can be a better source of data types information.
Parsing header files attempts to discover values of defines from macros and add them as enums which you lose from debug information. There are some changes planned to better recover from multiple definitions. Currently only the last definition for a define is kept.
An additional change to pull unknown datatypes from any open archive at the time of parse could help as well. If you were to compile code that included a header file the correct pre-include files would be necessary as well.
It would be helpful to have special defines for thing like abstract integer sizes. It is currently impossible to parse a struct with a bit field of type long with the size greater than 32 because even if the define for setting the long size is set the parser still assumes it 4 bytes and throws an error.
A workaround that I found to parse c header file is to use intermediate files.
Intermediate files can be generated using the -S -save-temps
gcc option. These intermediate .i filed are the result of the precompilation stage with all macro expended and all header files are merged in the right order within a single file with some debug comments added.
Steps for the workaround
-S -save-temps
option. This will generate a .i file.Some issues I encountered I still had to make some manual adjustment to my new header files for some edge cases:
__attribute__
in the intermediate files that ghidra's CPreParser don't like. So these need to be removed.(These issues aren't introduced by the workaround, they are the only remaining limitations of ghidra's CPreParser after using the workaround)
I used this workaround successfully to create archives for the fairly big GTK-2.0 library on Ubuntu 18.04.
I will say the current parser is not insufficient - it's unusable. I tried to import even partial headers of Linux kernel types and I gave up after 2 hours of fighting with errors like error in line -20.
Parsing C headers to archives is a pain. A naive approach would be parsing headers in a folder (recursively in each subfolder) in every possible order (i.e. changing the sorting of headers) until succeeds. Backtracking could also be used. Command line options too could be handled this way.
This could be implemented as an option (it may require some time to complete)