Closed clearbluejar closed 2 years ago
Thank you for the detailed breakdown of this issue, and the pull request. Right now, we are concerned with this being a hard-coded list instead of something that will automatically handle any java libraries the user has added. For the time being we will need to hold off on this pull request, but we appreciate you bringing this to our attention as it should be addressed.
Yeah that would be better to dynamically determine the packages loaded by the Ghidra. When I looked, I couldn't see easy access to list all the top level classes loaded besides manually looking at the classpath used to start the JVM. I can't see how you can list them from the jpype
API. There is a call to _jpype.isPackage you can make, but this just tells you whether or not the package was loaded. That is why I settled with the static list. Perhaps there is a dynamic way...
I've done some homework to enable finding dynamic packages instead of the static list.
It now dynamically finds the all the packages loaded with get_runtime_top_level_java_packages.
It seems to find them all:
{'utilities', 'org', 'log', 'resources', 'util', 'net', 'SWIG', 'docking', 'edu', 'java', 'jdk', 'agent', 'com', 'sun', 'mdemangler', 'functioncalls', 'foundation', 'decompiler', 'db', 'javax', 'help', 'utility', 'pdb', 'ghidra', 'generic'}
I had to add the call to _wrap_runtime_top_level_java_packages after the call to Application.initializeApplication(self.layout, config)
. Otherwise, some of the classes loaded at runtime were not being found.
I added to both HeadlessPyhidraLauncher
and DeferredPyhidraLauncher
, but couldn't figure out where it would go for GuiPyhidraLauncher
. Where does the initialization happen in that case?
One thing I noticed is that the dev will need to use the mod name for all the imports. If you use `import mod as modit will work, but the
from mod.submod import funcbreaks down. If the dev sticks with
mod_throughout, it all works the same a
jpype` wrapping. I added a test to demonstrate a from import.
I think this might cut it. I'm anxious to be a contributor, so if something isn't quite right I'd be happy to fix. Or if you would like to modify my submission please feel free. At any rate, this project is amazing. Thank you for it.
We think it should be possible to allow this to keep close to the standard python importlib logic by creating a class extending Loader and MetaPathFinder.
Only the find_spec
and create_module
methods need to be implemented. The exec_module
method must also be defined but you can just pass or return None
. An instance of this class should be appended to sys.meta_path
in PyhidraLauncher.start
just before launching Ghidra. By appending it to the meta_path the existing python and jpype import machinery will take precedence.
Implementing find_spec
should be rather straightforward. When implementing create_module
you'll want to manually create a jpype.JPackage
, add it sys.modules
and return the created JPackage
. This will allow the module to be imported and used with the trailing underscored while handing control of the module back to jpype.
We greatly appreciate your contributions and would appreciate it if you have the time to implement it. We may not get back to you for review until Monday.
This will allow the module to be imported and used with the trailing underscored while handing control of the module back to jpype.
I think the current implementation might be doing exactly as you ask, without adding a custom loader. In essence, the code now leverages the jpype
import machinery and simply wraps every runtime base Java package with a '_'. It takes advantage of the imports.registerDomain aliasing functionality by wrapping each base Java package (whether they conflict or not) and making them available with an '_'.
All non-conflicting base classes can be used as normal, but they are also now available with an '_'.
import ghidra # works
import ghidra_ # also works, but is unnecessary
# conflicting classes also work
# the convention for the dev will just need to be to use the mod_ for conflicts
import pdb_
from pdb_ import PdbParser
from pdb_.symbolserver import HttpServer
I could write a custom loader, but I think it would be difficult to best jpype
's _JImportLoader and handle all the edge cases.
class _JImportLoader:
""" (internal) Finder hook for importlib. """
def find_spec(self, name, path, target=None):
# If jvm is not started then we just check against the TLDs
if not _jpype.isStarted():
base = name.partition('.')[0]
if not base in _JDOMAINS:
return None
raise ImportError("Attempt to create Java package '%s' without jvm" % name)
# Check for aliases
if name in _JDOMAINS:
jname = _JDOMAINS[name]
if not _jpype.isPackage(jname):
raise ImportError("Java package '%s' not found, requested by alias '%s'" % (jname, name))
ms = _ModuleSpec(name, self)
ms._jname = jname
return ms
# Check if it is a TLD
parts = name.rpartition('.')
# Use the parent module to simplify name mangling
if not parts[1] and _jpype.isPackage(parts[2]):
ms = _ModuleSpec(name, self)
ms._jname = name
return ms
if not parts[1] and not _jpype.isPackage(parts[0]):
return None
base = sys.modules.get(parts[0], None)
if not base or not isinstance(base, _jpype._JPackage):
return None
# Support for external modules in java tree
name = unwrap(name)
for customizer in _CUSTOMIZERS:
if customizer.canCustomize(name):
return customizer.getSpec(name)
# Using isPackage eliminates need for registering tlds
if not hasattr(base, parts[2]):
# If the base is a Java package and it wasn't found in the
# package using getAttr, then we need to emit an error
# so we produce a meaningful diagnositic.
try:
# Use forname because it give better diagnostics
cls = _jpype._java_lang_Class.forName(name, True, _jpype.JPypeClassLoader)
# This code only is hit if an error was not thrown
if cls.getModifiers() & 1 == 0:
raise ImportError("Class `%s` is not public" % name)
raise ImportError("Class `%s` was found but was not expected" % name)
# Not found is acceptable
except Exception as ex:
raise ImportError("Failed to import '%s'" % name) from ex
# Import the java module
return _ModuleSpec(name, self)
""" (internal) Loader hook for importlib. """
def create_module(self, spec):
if spec.parent == "":
return _jpype._JPackage(spec._jname)
parts = spec.name.rsplit('.', 1)
rc = getattr(sys.modules[spec.parent], parts[1])
# Install the handler
rc._handler = _JExceptionHandler
return rc
def exec_module(self, fullname):
pass
If you still would like a custom loader. How did you want it to be different from _JImportLoader
? Did you want to try to put the base package lookup within the loader and only alias the base packages? Were there other improvements?
If not, I wanted to get your opinion on how the current implementation might benefit from a decorator around the that around the _launch method that would call the _wrap_runtime_top_level_java_packages
That way the code would call _wrap_runtime_top_level_java_packages
only after the subclass calls it's own _launch
. I think it is possible for extended classes to inherit a decorator. Also would appreciate any feedback on the naming, classmethods vs static, or basic python wisdom.
Let me know what your thoughts and I will take another shot once I'm clear on the task.
One of the key reasons we think it will be better to switch to a loader is that by adding it using sys.meta_path.append
it will always run after jpype's loader. As such the new loader won't have to duplicate any of this logic. This will free it up to only run against items with a conflict, which are provided by Python directly so there won't be a need for additional sets or loops to keep track of items.
We expect the end result will require a lot less code and be easier to maintain as a result.
Ah, I think I see it now. That does seem a bit simpler.
So we will assume the dev will still use the '_' for conflicting packages.
For a quick list of requirements for a pyhidra loader:
sys.meta_path.append
Quick version will be like:
class _PyhidraImportLoader:
def find_spec(self, name, path, target=None):
if name.endswith('_') and _jpype.isPackage(jname):
ms = _ModuleSpec(name, self)
ms._jname = jname
return ms
def create_module(self, spec):
if spec.parent == "":
return _jpype._JPackage(spec._jname)
parts = spec.name.rsplit('.', 1)
rc = getattr(sys.modules[spec.parent], parts[1])
# Install the handler
rc._handler = _JExceptionHandler
return rc
def exec_module(self, fullname):
pass
I will start on an implementation and see how it fairs in testing. Let me know if I'm on the right track and I'll see if I can get it in this weekend.
It sounds like you're on a good path. We look forward to reviewing the updates on Monday.
Does the jpype alias function not cover this? You should be able to add an alias to jpype such that it looks up Java pdb to something more name complient like gov.pdb or gov.ghidra.pdb using the register funcion.
I added a _PyhidraImportLoader and agree this way is quite clean and efficient. I added test cases for every mode of import I could think of and ensured that the new loader would handle all the base packages correctly. Let me know if it needs any other changes.
Does the jpype alias function not cover this? You should be able to add an alias to jpype such that it looks up Java pdb to something more name complient like gov.pdb or gov.ghidra.pdb using the register funcion.
The registerDomain
function definitely handles the case, but then we would have to make the call for every package, like I did originally. https://github.com/clearbluejar/pyhidra/commit/e48968e8e5344ed63dcaa18386065d4d4251a581#diff-41d9da554fbcd3b0cd1bd1be57cb50000748fbe459ac4dcf7bed6794c41554fcR159-R160
The custom loader just makes the classes available without having to register them at all, or have wait until the Ghidra application is fully initialized.
I made the review changes. Should I rebase and squash the commits?
That last force push was needed to set the author.
When trying to port over this script using Ghidra
pdb
Java package to Python leveragingpyhidra
, I realized an issue with a python stdlib namepdb
(python debugger) conflicting with Ghidra's base Java packagepdb
.When I tried to import
pdb
I received the python stdlib module:It ends up using a different standard
frozen_importlib_external.SourceFileLoader
when imported:instead of the desired jpype loader for the module.
This is actually a standard issue with
jpype
.This is because the
jpype
module spec finder is the last in the list forsys.meta_path
after it is appended using https://github.com/jpype-project/jpype/blob/f6ad8f4e5c6ece1f5a224de420968fa508f109fa/jpype/imports.py#L224-L225:There are several ways to go about handling the issue. The developer could simply give priority to the
jpype
finder by modifying the sys.meta_path order, giving precedence to_JImportLoader
:Option 1:
Option2:
The problem here is that this would simply reverse the problem. Then standard python modules that a dev might want to access wouldn't be available by the standard
import <mod>
.I thought maybe a class method to ensure the load of a Java module might work. You could force the load of a module using the
jpype
loader by adding this class to the base class of launcher.The issue I see with this is that you would still need to call this method for each package you wanted to ensure used a Java package.
or just call it first and then call import as the mod would be loaded in sys.modules
After seeing how
jpype
"handles" the issue, I think it might be easier to follow their lead. They simply wrap conflicting keywords to avoid the conflict.I derived the base Java package names from sorting all the top level directories from ghidra source:
We can register all the base Ghidra base Java package names with a name that won't conflict. Simply append a '_' to all Ghidra base Java packages names in the initialization.
pyhidra/launcher.py
This would allow the developer to do the following and know that they are importing the correct package:
The changes for the PR add a method inside the base
launcher.start
that would initialize each base class with thejpype.imports.registerDomain
. From that point on, all the base classes are available to to developer without conflicts. The developer can still use the originalimport <non conflicting base class>
but will need to use theimport mod_ as mod
for conflicting packages. This should somewhat future proof as python and Ghidra Java packages change.Might want to add something to the README to account for this. Something like: