Open justanotheranonymoususer opened 9 months ago
This request speaks to a larger requirement to be able to provide custom analyzer options to ghidriff
, which I have been meaning to do and shouldn't be too hard. As I already set some custom ones.
For example. If you save the options for the screenshot above it generates a custom options file like:
{
"SAVE_STATE_NAME": "File_Options",
"VALUES": {
"WindowsPE x86 Propagate External Parameters": true,
"Aggressive Instruction Finder": true,
"PDB Universal.Search remote symbol servers": true,
"Condense Filler Bytes": true,
"Decompiler Parameter ID": true,
"Variadic Function Signature Override": true,
"PDB MSDIA": true
},
"TYPES": {
"WindowsPE x86 Propagate External Parameters": "boolean",
"Aggressive Instruction Finder": "boolean",
"PDB Universal.Search remote symbol servers": "boolean",
"Condense Filler Bytes": "boolean",
"Decompiler Parameter ID": "boolean",
"Variadic Function Signature Override": "boolean",
"PDB MSDIA": "boolean"
},
"ENUM_CLASSES": {}
}
I think in short order I could support that in ghidriff, as a command line option to supply custom analysis. What do you think?
Alternatively, at the moment, if you want to try your already analyzed file in Ghidra. Just export the binary / each binary to a Ghidra Zipped format. See the latest release picture. You can export the binary to my_large_bin1.gzf and my_large_bin2.gzf. Then you can pass the already analyzed bins to to ghidriff for diffing.
ghidriff my_large_bin1.gzf my_large_bin2.gzf
I just put this out though, so I am curious of the results. Let me know if you try it and if it works for you. Based on your feedback, I'll likely create a ticket to support custom analysis options generally.
"I think in short order I could support that in ghidriff" - sounds good, maybe sth like:
--analysis-option="PDB MSDIA=true"
Or a json that will be used to override options.
"if you want to try your already analyzed file in Ghidra" - frankly I already used bindiff, but I'll try that later.
Download of pdbs always fails for me, I had to use another tool to download:
INFO | ghidriff | Setting up Symbol Server for symbols...
INFO | ghidriff | path: ghidriffs\symbols level: 1
INFO | ghidriff | Symbol Server Configured path: SymbolServerService:
symbolStore: LocalSymbolStore: [ rootDir: C:\Users\User\Desktop\diff2\ghidriffs\symbols, storageLevel: -1],
symbolServers:
HttpSymbolServer: [ url: https://msdl.microsoft.com/download/symbols/, storageLevel: -1]
HttpSymbolServer: [ url: https://chromium-browser-symsrv.commondatastorage.googleapis.com/, storageLevel: -1]
HttpSymbolServer: [ url: https://symbols.mozilla.org/, storageLevel: -1]
HttpSymbolServer: [ url: https://software.intel.com/sites/downloads/symbols/, storageLevel: -1]
HttpSymbolServer: [ url: https://driver-symbols.nvidia.com/, storageLevel: -1]
HttpSymbolServer: [ url: https://download.amd.com/dir/bin/, storageLevel: -1]
INFO Connecting to https://msdl.microsoft.com/download/symbols/ (ConsoleTaskMonitor)
INFO Success (ConsoleTaskMonitor)
INFO Storing <XXX>.pdb in local symbol store (338.91MB) (ConsoleTaskMonitor)
WARN SymbolServerService: error copying file https://msdl.microsoft.com/download/symbols/<XXX>.pdb/<YYY>/<XXX>.pdb to C:\Users\User\Desktop\diff2\ghidriffs\symbols: closed (SymbolServerService)
INFO Connecting to https://msdl.microsoft.com/download/symbols/ (ConsoleTaskMonitor)
INFO Success (ConsoleTaskMonitor)
INFO Storing <XXX>.pdb in local symbol store (338.91MB) (ConsoleTaskMonitor)
Then I got this assert:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Scripts\ghidriff.exe\__main__.py", line 7, in <module>
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ghidriff\__main__.py", line 82, in main
pdiff = d.diff_bins(diff[0], diff[1])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ghidriff\ghidra_diff_engine.py", line 1170, in diff_bins
assert sym_count_diff < 4000, f'Symbols counts between programs ({p1.name} and {p2.name}) are too high {sym_count_diff}! Likely bad analyiss or only one binary has symbols! Check Ghidra analysis or pdb! Add --force-diff to ignore this assert'
^^^^^^^^^^^^^^^^^^^^^
AssertionError: Symbols counts between programs (<XXX>_1.dll and <XXX>-2.dll) are too high 82149! Likely bad analyiss or only one binary has symbols! Check Ghidra analysis or pdb! Add --force-diff to ignore this assert
BTW typo: analyiss
I added --force-diff
, now it seems to be working, I'm waiting for it to complete.
Symbols counts between programs (
_1.dll and -2.dll) are too high 82149!
If one version has symbols and the other doesn't, it becomes difficult to match the functions because Ghidra will have a different set of functions for each binary. So sometimes functions won't be aligned. That assertion is there to let you know you are stepping into a diff that might not work.
That being said, I have seen even partial diffs be useful. There is also an option to run without symbols (which again sometimes can be best if the analysis with and without symbols is so changed). Everything depends.
Did the diff finish?
If one version has symbols and the other doesn't
I don't think that's the case, file size is similar. Here are both files: old: https://msdl.microsoft.com/download/symbols/windows.ui.xaml.dll/9C04CA1E1226000/windows.ui.xaml.dll new: https://msdl.microsoft.com/download/symbols/windows.ui.xaml.dll/A6D203221226000/windows.ui.xaml.dll
Did the diff finish?
It failed with:
...
INFO | ghidriff | Completed 5111 at 95%
WARNING| ghidriff | Code diff type not appended for ?close_reset@?$close_invoke_helper@$00P6AXPEAX@_E$1?ReleaseMutex@details@wil@@YAX0@ZPEAX@details@wil@@SAXPEAX@Z due to jumptable decomp issue
WARNING| ghidriff | Code diff type not appended for ?close_reset@?$close_invoke_helper@$00P6AXPEAX@_E$1?CloseHandle@details@wil@@YAX0@ZPEAX@details@wil@@SAXPEAX@Z due to jumptable decomp issue
WARNING| ghidriff | Code diff type not appended for ?OSMemoryFree@XcpAllocation@@YAXPEAX@Z due to jumptable decomp issue
WARNING| ghidriff | Code diff type not appended for ?OSMemoryFree@XcpAllocation@@YAXPEAX@Z due to jumptable decomp issue
WARNING| ghidriff | Code diff type not appended for ?OSMemoryFree@XcpAllocation@@YAXPEAX@Z due to jumptable decomp issue
WARNING| ghidriff | Code diff type not appended for ?ReleaseWeak@control_block@details@xref@@QEAAIXZ due to jumptable decomp issue
WARNING| ghidriff | Code diff type not appended for ?_Tidy@?$vector@Vxstring_ptr@@V?$allocator@Vxstring_ptr@@@std@@@std@@AEAAXXZ due to jumptable decomp issue
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Scripts\ghidriff.exe\__main__.py", line 7, in <module>
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ghidriff\__main__.py", line 82, in main
pdiff = d.diff_bins(diff[0], diff[1])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ghidriff\ghidra_diff_engine.py", line 1446, in diff_bins
pdiff['old_pe_url'] = self.get_pe_download_url(old, pdiff['old_meta'][pe_key])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ghidriff\ghidra_diff_engine.py", line 820, in get_pe_download_url
pe_info = get_pe_extra_data(path)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\ghidriff\utils.py", line 41, in get_pe_extra_data
machine = unpack('<H', word)[0]
^^^^^^^^^^^^^^^^^^
struct.error: unpack requires a buffer of 2 bytes
ah.. seems like the pe_url generation is failing for that binary.
That isn't a critical function. just gives you a nice wget
original binary command line.
Like this:
Which seems like another issue to resolve. :)
Storing windows.ui.xaml.pdb in local symbol store (338.91MB) (ConsoleTaskMonitor)
The PDB for the binary is 350 MB! wow.
And the binary is 18MB...
I just kicked off a local test. I will see if it survives it.
That's not so large, chromium pdbs are several GBs
On Wed, Dec 20, 2023, 07:05 clearbluejar @.***> wrote:
ah.. seems like the pe_url generation is failing for that binary.
That isn't a critical function. just gives you a nice wget original binary command line. Like this: image.png (view on web) https://github.com/clearbluejar/ghidriff/assets/3752074/26971955-f1cf-417f-b36a-364aa75fe45e
Which seems like another issue to resolve. :)
Storing windows.ui.xaml.pdb in local symbol store (338.91MB) (ConsoleTaskMonitor) The PDB for the binary is 350 MB! wow.
And the binary is 18MB...
I just kicked off a local test. I will see if it survives it.
— Reply to this email directly, view it on GitHub https://github.com/clearbluejar/ghidriff/issues/65#issuecomment-1863858603, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMDRPFDKBVWHSNMJ3O3QOLYKJWYXAVCNFSM6AAAAABAXO2YIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTHA2TQNRQGM . You are receiving this because you authored the thread.Message ID: @.***>
This is how analysis is going:
I ran out of heap and actually crashed the JVM. This is Ghidra analysis (before ghidriff
is doing any work). I can bump up the heap for the jvm, but how much will I need. How much RAM are you working with? I can also turn off threading so it only analyzes one binary at a time with --no-threaded
. Trying again.
Did you use MSDIA? ram I think I used 16GB
On Wed, Dec 20, 2023, 07:26 clearbluejar @.***> wrote:
This is how analysis is going: image.png (view on web) https://github.com/clearbluejar/ghidriff/assets/3752074/afd2f19f-d610-452b-95e3-a23ab1f0a4f3
I ran out of heap and actually crashed the JVM. This is Ghidra analysis (before ghidriff is doing any work). I can bump up the heap for the jvm, but how much will I need. How much RAM are you working with? I can also turn off threading so it only analyzes one binary at a time with --no-threaded. Trying again.
— Reply to this email directly, view it on GitHub https://github.com/clearbluejar/ghidriff/issues/65#issuecomment-1863876339, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMDRPG7HEKZH7S5AVTRIA3YKJZHLAVCNFSM6AAAAABAXO2YIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTHA3TMMZTHE . You are receiving this because you authored the thread.Message ID: @.***>
ah no, just using command-line on linux, regular pdb universal. maybe it can't handle it...
Yeah, that's the issue I linked at the beginning
On Wed, Dec 20, 2023, 07:39 clearbluejar @.***> wrote:
ah no, just using command-line on linux, regular pdb universal. maybe it can't handle it...
— Reply to this email directly, view it on GitHub https://github.com/clearbluejar/ghidriff/issues/65#issuecomment-1863886280, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMDRPGHZUDDKA5PEAIBM3LYKJ2XLAVCNFSM6AAAAABAXO2YIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTHA4DMMRYGA . You are receiving this because you authored the thread.Message ID: @.***>
Full circle. 🤦♂️ Sorry.
I have yet to use MSDIA for Ghidra, besides the analysis option needed, and having to run it on Windows (because that is a requirement for MSDIA right?), is there anything else you need to run on the PDB to make it work? Or MSDIA is just another parser for the PDB that handles large ones better, so there is no preprocessing needed, it can just run with the original PDB.
I think MSDIA is just another parser for the PDB that handles large ones better, so there is no preprocessing needed. And probably Windows only indeed, but I'm not sure.
On Wed, Dec 20, 2023, 07:45 clearbluejar @.***> wrote:
Full circle. 🤦♂️ Sorry.
I have yet to use MSDIA for Ghidra, besides the analysis option needed, and having to run it on Windows (because that is a requirement for MSDIA right?), is there anything else you need to run on the PDB to make it work? Or MSDIA is just another parser for the PDB that handles large ones better, so there is no preprocessing needed, it can just run with the original PDB.
— Reply to this email directly, view it on GitHub https://github.com/clearbluejar/ghidriff/issues/65#issuecomment-1863890807, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMDRPGCJV5ZMLHBTUBWGZTYKJ3ODAVCNFSM6AAAAABAXO2YIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTHA4TAOBQG4 . You are receiving this because you authored the thread.Message ID: @.***>
Will need to get back to you when I can test with Windows. I will try to add the options json import to enable all the Ghidra analysis settings.
Now Ghidra 11 is released with some pdb improvements, maybe now it won't OOM, worth trying
For large binaries, Universal fails with OOM. See: https://github.com/NationalSecurityAgency/ghidra/issues/2485
For this reason I couldn't try this tool with my binary.
Please add a command line option to switch to MSDIA.