Closed subreption-research closed 1 month ago
We have had some discussions about making all ExtensionPoint
s "scriptable", so you can distribute an analyzer/loader/filesytem source file instead of distributing a heavy-weight prebuilt extension. These are just initial discussions though...no work as been planned yet. But, that's the level from which we'd likely want to tackle the problem from, so more than just analyzer's would benefit. Ideally this would also work as python source too.
We just recently completed the YARA analyzer (in Java) and it seems doable to explore the options in PyGhidra for creating a "fabric" between Analyzers and the Python-side. The main issue with Java extensions is the maintenance burden of any dependencies, especially native ones (since we need to build against OS X, Linux and Windows).
A realistic first milestone could be writing the loader and event handler to support the methods for Analyzer classes. We will look into this when time permits. Supporting the core functionality isn't too daunting but handling corner cases properly might be (for example cancelling the Analyzer gracefully).
It would be helpful to put together more documentation for the new PyGhidra capabilities.
Will comment on #6781 for the Yara extension progress meanwhile.
Maybe this is what you're looking for?
Maybe this is what you're looking for?
This seems to be limited to Java extensions/external code, what we would like to have is an entire layer Python-side that integrates seamlessly into the Analyzer process, so that we can write Analyzer classes in Python handling the methods there (options, added, ended, etc), with no functional differences versus a compiled Analyzer extension. This would also immediately expose the OS libraries and Python modules, making things easier in the long-run.
Looks like the ClassSearcher
functionality would need to be "extendable" such that Python can locate and provide instances of the requested ExtensionPoint
interfaces. You can't instantiate a proxy class in Java so I think the getClasses
methods would be unusable outside of the Java case.
Probably have to do something like this. I whipped this up in about 30 minutes, so it's probably full of flaws.
import importlib
import pkgutil
import typing
import jpype
from java.lang import UnsupportedOperationException
_ExtensionPoints = dict()
def load_subpackages(monitor, pkg):
for subpkg in pkgutil.iter_modules(pkg.__path__):
monitor.checkCancelled()
if subpkg.ispkg:
importlib.import_module(subpkg, pkg)
def ExtensionPoint(extension: typing.Union[jpype.JClass, str]):
def wrapper(cls):
nonlocal extension
cls = jpype.JImplements(extension)
# only add it if it succeeds
if not isinstance(extension, jpype.JClass):
extension = jpype.JClass(extension)
# should be a collection sorted by priority
extensions = _ExtensionPoints.get(extension, set())
extensions.add(cls)
_ExtensionPoints[extension] = extensions
return cls
return wrapper
# this isn't an interface, I'm pretending it is to present the idea
@jpype.JImplements("ghidra.util.classfinder.ClassSearcher")
class ClassSearcher:
@jpype.JOverride
def search(monitor):
# not as efficient as the Java searcher because we have to load the modules
for entry in importlib.metadata.entry_points(group='pyghidra.extension_points'):
monitor.checkCancelled()
try:
# load all packages and subpackages
# use of the ExtensionPoint decorator will register them accordingly
load_subpackages(monitor, entry.load())
except Exception as e:
# log in Ghidra log
pass
@jpype.JOverride
def getClasses(*args):
raise UnsupportedOperationException()
@jpype.JOverride
def getInstances(extension):
if not isinstance(extension, jpype.JClass):
extension = jpype.JClass(extension)
return [cls() for cls in _ExtensionPoints.get(extension, [])]
Currently, if we are not mistaken, there isn't a mechanism to dynamically load or plug Python extensions into the auto-analysis process. Extensions are required to be written in Java, which is not necessarily a problem but it is a maintenance and end-user burden sometimes.
Ideally, we would like to see (or contribute to) a standardized API/mechanism that can load Python extensions providing auto-analysis fucntionality, with their own settings integrated in the existent configuration handling, and the possibility of adding widgets or UI elements programmtically.
This could be done through static variables and callbacks, with no direct widget-related calls from the Python side (for example, an extension might define N tabs populated through a dictionary, each with settings that are assigned to an unique ID and can be translated to settings that can be saved "as is"), removing the complexity of bridging widget/UI control.
The initial design could be as simple as providing the following callbacks:
In our case this idea was floated by one of our developers related to #6781.
The reason for not limiting such extensions to a script or similar is mostly related to the additional steps in running them, and the fact that the scripting capabilities seem more like a feature to allow for small ad-hoc operations, and have grown to be a relatively disorganized repository of one-off solutions. This might be debatable, but it can be argued that more seamless integration will open the path to better integration of more complex tooling. Ultimately, it's a quality of life issue.