Closed TylerKann closed 4 years ago
Function.getParameters()
returns the functions parameters as they have been defined. If you know what the parameters should be you need to set them as such.
struct str_t {
char *buf;
int len;
};
unsigned int f(struct str_t *str) {
unsigned int hash = 5381;
unsigned int i = 0;
for (i = 0; i < str->len; i++)
{
hash = ((hash << 5) + hash) + str->buf[i];
}
return hash;
}
I dont fully understand your comment. Here i have a simple structure with a pointer to it. In a perfect world it would return struct str_t, but that wont happen. Even if it returned some type of char* that would be good. But it returns long ? I have no clue why this would happen?
Well no, it should return an unsigned int as per the function signature of f.
If you have an undefined function like below:
You must change the signature to match what it should be:
In the root of your ghidra installation there is a folder named docs. From there is another folder name GhidraClass. I recommend going through its contents.
Okay thanks, Ill check that out! The goal is doing this automatically (headless), so not too keen on individually modifying the functions. Why would it return unsigned int? That is what the function is returning not the input
Okay thanks, Ill check that out! The goal is doing this automatically (headless), so not too keen on individually modifying the functions. Why would it return unsigned int? That is what the function is returning not the input
Unfortunately at some point you're going to have to do some things manually if you need that level of detail.
When you said returning I thought you meant the functions return type instead of the parameter's type. Doing as shown above would make it the proper struct str_t *
assuming you have created the struct str_t
data type.
You can set the parameters in headless mode but it is not as convenient. It is likely easiest to create a FunctionDefinitionDataType and apply it to a function of your choice.
FunctionDefinitionDataType ApplyFunctionSignatureCmd
You can also use ApplyFunctionDataTypesCmd to apply the signatures in the provided datatype archives.
If you would like to get a lot out of this you could get the dbg libraries and create a datatype archive of all the function definitions and datatypes and then use the standard libraries to create a FunctionID database file (fidb). Then via healess analysis you could detect a ton of known functions which would get labeled and then apply the correct signatures/datatypes with the ApplyFunctionDataTypesCmd. This is just a thought though. It may or may not work out.
int main(int argc, char *argv[]) {
struct str_t str;
str.buf = argv[1];
str.len = strlen(str.buf);
I have created the structure, yet it does not show that
int main(int argc, char *argv[]) { struct str_t str; str.buf = argv[1]; str.len = strlen(str.buf);
I have created the structure, yet it does not show that
Please consider providing more information.
#include <stdlib.h>
#include <string.h>
struct str_t {
char *buf;
int len;
};
unsigned int f(struct str_t str) {
unsigned int hash = 5381;
unsigned int i = 0;
for (i = 0; i < str.len; i++)
{
hash = ((hash << 5) + hash) + str.buf[i];
}
return hash;
}
int main(int argc, char *argv[]) {
struct str_t str;
str.buf = argv[1];
str.len = strlen(str.buf);
printf("%d \n", f(str));
return EXIT_SUCCESS;
}
I get the function in python by doing a simple looping checking if function.getFunctionAfter(function) is ever the name of my function (hash). Once I find it, I "prime" my function by decompiling it. This is done by getting the HighFunction
d = interface.decompileFunction(f, 5, None)
high = d.getHighFunction()
Once I have the HighFunction, i commitParamsToDB :
y = ghidra.program.model.pcode.HighFunctionDBUtil
y.commitParamsToDatabase
y.commitReturnToDatabase
y.commitParamsToDatabase(high, True, ghidra.program.model.symbol.SourceType.USER_DEFINED)
y.commitReturnToDatabase(high, ghidra.program.model.symbol.SourceType.USER_DEFINED)
after this i should just be able to get f.getParameters(). It works like 70% of the time. But some cases I get bad results
Oh, you are using the high function for parameters.
I recommend avoiding directly committing to the database.
Here is the approach I recommend.
Run default enabled analyzers, modifying options if necessary. Leave the parameter id and calling convention analyzers disabled.
After the initial analysis is performed run any additional preferred means of identifying known functions such as yara signatures for example.
Apply the FunctionDefinitionDataTypes with the class linked in an above comment.
Make a second pass with only the reference analyzers enabled. (Data reference, constant reference, etc)
Now make a third pass with only the decompiler anyzers enabled. (Calling Convention ID, Parameter ID, etc). Enable the commit void return types option.
Results may vary
The above steps should yield reasonable results as the parameter types of known functions will have propagated through to data and then to non known function variables and parameters through references to the data and to the known function calls. There should be a method in the FlatProgramAPI to get a function by its name and namespace. You should use that method. If you don't need the decompiled function then you don't need to decompile it. You can check the function parameters and signature directly through the "low" Function class. Its parameters and return type will be identical to the high function one the parameters and return type have been committed to the database (done by the parameter id analyzer)
When you directly commit the parameters and return type to the database you are introducing a larger margin of error as the undefined datatypes will be propagated and the parameter id analyzer will skip over the function during analysis since everything is already committed.
I understand that your end goal is to do everything by headless analysis but you should still use the gui interface. It will get you familiar with how ghidra works as well as what analyzers and scripts are available and their options.
If you are uncertain how to do something and can't find the answer from a Google search you can send me an email before opening an issue if you'd like. I have it on my github profile publicly viewable. The main thing I'm not familiar with is headless analysis as I've never had the need for it. I don't do much in batches and it is difficult to truly follow data and function flow without looking at it.
Wow. This is a lot of information. Thanks a lot. I'll definitely take your advice and do it in the gui so I can get a feel for it. However, in the end, there shouldn't be anything preventing me from doing it headless right?
Wow. This is a lot of information. Thanks a lot. I'll definitely take your advice and do it in the gui so I can get a feel for it. However, in the end, there shouldn't be anything preventing me from doing it headless right?
There shouldn't be.
Thanks @astrelsky . You are quite the ghidra beast
@astrelsky Sorry, I haven't had the chance to get look at this until now. However,I am not having great results. Still using the aforementioned code, it is only returning a long. Here is what I did (note: i did the python part in ghidra_bridge) :
After all that it just returned a long, not the struct I created
You are creating a FunctionDefinitionDataType from a function with the wrong signature. The definitions should be used for library functions and functions whose signature is known in advance. You need to make them ahead of time.
There is no way for ghidra to know about your struct unless it has been applied somewhere and determined to be a parameter/return type through type propagation.
You are creating a FunctionDefinitionDataType from a function with the wrong signature. The definitions should be used for library functions and functions whose signature is known in advance. You need to make them ahead of time.
So if I had this binary given to me out of thin air there would be no way to know? I guess I dont fully understand what you are saying, sorry.
You are creating a FunctionDefinitionDataType from a function with the wrong signature. The definitions should be used for library functions and functions whose signature is known in advance. You need to make them ahead of time.
So if I had this binary given to me out of thin air there would be no way to know? I guess I dont fully understand what you are saying, sorry.
Without any sort of debug information there would be no way to know.
@astrelsky would there be anything outside of ghidra that would work ? It sucks that it works like 70% but not perfectly
@astrelsky would there be anything outside of ghidra that would work ? It sucks that it works like 70% but not perfectly
No. You cannot expect anything to be able to pull 100% accurate information out of thin air.
There may be tools that could determine it to be a struct but it would be impossible for it's name and members names to be correct.
@astrelsky has answered this thoroughly.
Reopening this issue, as I'm pretty sure there is in fact a bug (or is it a poor feature?) in getParameters(). This function returns false negatives (doesn't not detect actual parameters) when parameters are not copied onto the stack. For example, for something like this code:
#include<stdio.h>
void print(int a){
printf("My number: %d\n", a);
}
void print_proxy(int b){
print(b);
}
int main(void){
int x = 5;
print_proxy(x);
return 0;
}
Ghidra's getParameters() seems to think that print_proxy() takes no parameters, because b is never copied onto the stack. Obviously my trivial example above is almost too simple to be of concern, but I have encountered real-world functions that never copy (some) parameters to the stack. I think this is a bug, not a feature. I got around this problem in a script by getting the decompiled code (which seems to be correct, even in this case), and just parsing the parameters out of the decompiled code string. Thoughts? Would you like me to take a crack at patching it?
sometimes getParameters() kind of does a bad job. I have a struct that I want it to recognize, but it doesnt recognize the struct. It doesnt really recognize strings either. It just says they are undefined8. Is there anyway to improve the fidelity of this?