NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.7k stars 5.78k forks source link

SymbolicPropogator.flowConstants is useless #5670

Open astrelsky opened 1 year ago

astrelsky commented 1 year ago

Is your feature request related to a problem? Please describe. I'm always frustrated when I want to use the SymbolicPropogator but I can't because the correct ConstantPropagationContextEvaluator for the current processor isn't available.

Describe the solution you'd like Expose a method for getting the correct ConstantPropagationContextEvaluator for the current processor.

Describe alternatives you've considered Dark arts of reflection.

Additional context https://github.com/NationalSecurityAgency/ghidra/blob/master/Ghidra/Processors/MIPS/src/main/java/ghidra/app/plugin/core/analysis/MipsAddressAnalyzer.java#L251

emteere commented 1 year ago

Are you wanting to use it outside of the analysis pipeline by calling it directly, say in a script?

The SymbolicPropogator is used for many things that aren't necessarily processor specific and don't need/want the particular analyzer because general references aren't being added.

There is a basic constant analyzer that will be used if there isn't a specific one. There are some particular caveats that it disables the basic constant analyzer, but it can be turned on for any processor.

Each more specific constant analyzer registers with the BasicConstantAnalyzer. I think we could expose that in some fashion.

I do agree there should be something that allows you to instantiate and call the analyzer directly. I was just writing a JUNIT and you can't call the correct analyzer directly. In this case, I know which analyzer I want for the test.

astrelsky commented 1 year ago

I have wanted to use it in a script and in my own analyzers in the past. This is actually something I have held onto for a few years and it occurred to me this morning that I never submitted it.

It's hard to know if the basic one would be sufficient or not. For my use case I wanted to recover constant arguments being passed to functions. Unfortunately the SymbolicPropogator can only get me the values from register parameters and not ones on the stack so I had to skip analysis for certain processors such as 32-bit x86. I think I eventually said to hell with it and grabbed the decompiler output and started processing the clang tokens.

I don't think I could have been any ruder with the issue title. :joy:

emteere commented 1 year ago

Understand. No rudeness received.

In 10.3.x the SymbolicPropagator tracks stack references as well as the register parameters. Decompiler is really good at parameters given good information.

astrelsky commented 1 year ago

Understand. No rudeness received.

In 10.3.x the SymbolicPropagator tracks stack references as well as the register parameters.

It might track them but there is no way to get the constant value at a stack location at a particular instruction like you can with a register using SymbolicPropogator::getRegisterValue

Decompiler is really good at parameters given good information.

Unfortunately, from a programmatic standpoint, the information from the decompiler is generally inaccessible and difficult to use. Your options are to parse the clang tokens or do the decompilers job over again and process the high pcode after the decompiler has "simplified" it. I'm of the opinion that you should not need to know how pcode works or ever have to directly work with it from a plugin.

slippycheeze commented 1 year ago

FWIW, I've had a little more success using SymbolicPropagator after seeing it mentioned in this issue. I figured I'd share why I used it, and where it did (and didn't) work for me, in the hope it adds value for y'all. Please forgive my presumption if it does not.

Concretely, this was some analysis intended to label data based on function calls; because x86_64, and luck I suppose, all the arguments I wanted to find so far had been passed in registers, so I didn't encounter the issue of obtaining values present on the stack.

Set the symbol name / label on globals (and perhaps in future fields) assigned directly from GetProcAddress to the string value passed as the symbol name to look up; given the code:

// decompiler: DAT_18089d6d8 = GetProcAddress(hModule, "lua_gettop");
18046a2ae      MOV               RCX,  qword ptr [LuaEngine_hModule]
18046a2b5      LEA               RDX,  [s_lua_gettop_18061b7c8]
18046a2bc      CALL              qword ptr [->KERNEL32.DLL::GetProcAddress]

I tried using SymbolicPropagator::getRegisterValue to get RDX, then get the string value; this was mostly successful, if cumbersome, compared to extracting the information from the ClangTokenGroup the decompiler would return.

The biggest advantage that SymbolicPropagator had was that I could ask about "the value at a specific, concrete address in memory", where with the decompiler it was ... harder. (Just, I think, that it didn't provide a simple function call equivalent to getRegisterValue wrapping the process.)

The same pattern worked well when what I wanted was a pointer as an argument; also worked well for _Init_thread_header which took a pointer to an int used as a lock, and the same basic pattern for other function calls: for getting a pointer to something passed as a register argument, it was faster to write around than the decompiler output.

The most annoying problem was that occasionally it would return a value in the range [-0xffff,0xffff], well outside the program memory, despite what looked like the same data flow as places it was successful, but not a pointer to a global (eg: struct field value, parameter passed from caller to the containing function).

I figured that was just the nature of data flow analysis, though, and moved on; if I got a value that satisfied currentProgram.getMemory().contains(value_as_address) it was always what I, the human, expected, so I could simply filter that way.

On the decompiler front, it became much easier to work with once I threw away most of the tokens, but it is challenging to extract useful information from; I'm pretty sure I'll end up writing something akin to the MSIL Harmony2 CodeMatcher class without the mutators, to wrap the patterns of scanning for information in the result.

I agree it is very cumbersome to deal with, however, and in terms of processing things like GetProcAddress above I simply collected all the functions that contained the calling instruction, removed duplicates, and processed the function as a whole from decompiler output without reference to the callsite: it simply wasn't useful to know which addresss made the call any longer.