Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
937 stars 213 forks source link

Analysis time of a function is only added up and never reset if the analysis update is not user update #6171

Open xusheng6 opened 1 day ago

xusheng6 commented 1 day ago

While working on https://github.com/mandiant/capa/issues/2406#issuecomment-2490171179, I noticed that the MLIL of function 0x467464 can normally be retrieved after a few seconds. But when binja was used in capa, it cannot retrieve the IL, thus leading to a crash.

Normal run:

>>> from binaryninja import *
>>> bv = binaryninja.load('/Users/xusheng/Downloads/112f9f0e8d349858a80dd8c14190e620.exe_.')
>>> func = bv.get_function_at(0x467464)
>>> func
<func: x86@0x467464>
>>> func.mlil
<MediumLevelILFunction: x86@0x467464>

It does take func.mlil a few seconds to run, but everything works as expected. We need to see why it is not working in the case of capa

Related to https://github.com/Vector35/binaryninja-api/issues/6170

xusheng6 commented 1 day ago

112f9f0e8d349858a80dd8c14190e620.exe_.zip

xusheng6 commented 1 day ago

Also see 0x8091b80 in b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_.zip

xusheng6 commented 1 day ago

capa stack trace:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/xusheng/capa/capa/main.py", line 1103, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/xusheng/capa/capa/main.py", line 994, in main
    capabilities, counts = find_capabilities(rules, extractor, disable_progress=args.quiet)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xusheng/capa/capa/capabilities/common.py", line 75, in find_capabilities
    return find_static_capabilities(ruleset, extractor, disable_progress=disable_progress, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xusheng/capa/capa/capabilities/static.py", line 168, in find_static_capabilities
    function_matches, bb_matches, insn_matches, feature_count = find_code_capabilities(ruleset, extractor, f)
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xusheng/capa/capa/capabilities/static.py", line 114, in find_code_capabilities
    for bb in extractor.get_basic_blocks(fh):
  File "/Users/xusheng/capa/capa/features/extractors/binja/extractor.py", line 58, in get_basic_blocks
    for mlil_bb in f.mlil.basic_blocks:
                   ^^^^^^
  File "/Applications/Binary Ninja.app/Contents/Resources/python/binaryninja/function.py", line 1039, in mlil
    raise ILException(f"Medium level IL was not loaded for {self!r}")
binaryninja.exceptions.ILException: Medium level IL was not loaded for <func: x86@0x8082d40>

And the Python script that works:

from binaryninja import *
bv = binaryninja.load('/Users/xusheng/Downloads/b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_.bndb')
func = bv.get_function_at(0x8082d40)
print(func.mlil)
xusheng6 commented 1 day ago

I checked the analysis skip reason for the function is ExceedFunctionAnalysisTimeSkipReason

But it still does NOT look right, because the analysis of that function takes like 2 seconds, there is no way the 20 seconds default time can be exceeded.

Update: I printed the analysis time and it is indeed questionable:

analysis skipped: 3, analysis info: {'Total': 0.002902333}
analysis skipped: 3, analysis info: {'Total': 0.0081085}

The analysis time of the two skipped functions are all very short

For comparison, I wrote a small script that iterates over all functions and retrieve the MLIL of each function. It took roughly 0.7 seconds to generate the MLIL for function 0x8082d40

xusheng6 commented 1 day ago

I made some interesting discovery in https://github.com/mandiant/capa/issues/2402#issuecomment-2491164752. However, that still does not explain why the analysis could time-out on a not-so-challenging function at all

xusheng6 commented 13 hours ago

OK so I finally figured out what is happening here. So we are only resetting the function analysis time if the analysis is a user update. If it is always auto-update, then we never reset the timer and the time just gets added up, eventually exceeding the 20 seconds threshold. Here is a brief explanation of what happens:

  1. The script retrieves the IL of function 0x8082d40
  2. It took 0.7 seconds to update the analysis and generate the IL
  3. The IL of function 0x8082d40 is cached
  4. The script requests the IL of some other 60 functions, exceeding the number of analysis.limits.cacheSize
  5. The cached IL of function 0x8082d40 is discarded
  6. The script retrieves the IL of function 0x8082d40 again
  7. It took another 0.7 seconds to update the analysis and generate the IL. The two times are added up
  8. Repeat the process many times until the total analysis time of 0x8082d40 is larger than 20 seconds
  9. The analysis time of 0x8082d40 exceeds analysis.limits.maxFunctionAnalysisTime
  10. The analysis bails out with a reason ExceedFunctionAnalysisTimeSkipReason

I understand we are probably only resetting the total analysis time on user updates for a good reason, and this cached-related repetitive generation of the IL is probably never encountered before