deephaven / deephaven-core

Deephaven Community Core
Other
252 stars 80 forks source link

Make Python import Java types using `import` #1525

Open chipkent opened 2 years ago

chipkent commented 2 years ago

PR #1450 is attempting to address the need for Java types in Python without forcing users to touch jpy. During the discussions on this PR, it became clear that importing Java types via Python's import statement, without the explicit use of jpy, would be a superior solution. Some research into the subject makes the strategy look viable.

The proposed syntax is something like:

from java.text import SimpleDateFormat

sdf = SimpleDateFormat("...")

To make this feature work, we need to hook into Python's import statements. These documents indicate that it is possible. https://realpython.com/python-import/#finders-and-loaders https://docs.python.org/3/library/importlib.html https://www.python.org/dev/peps/pep-0302/ https://www.python.org/dev/peps/pep-0451/

mofojed commented 2 years ago

@jmao-denver I found a couple useful tidbits when I was looking into something similar, in particular: https://docs.python.org/3/library/functions.html#import__

This function is invoked by the import statement. It can be replaced (by importing the builtins module and assigning to builtins.import) in order to change semantics of the import statement, but doing so is strongly discouraged as it is usually simpler to use import hooks (see PEP 302) to attain the same goals and does not cause issues with code which assumes the default import implementation is in use. Direct use of import() is also discouraged in favor of importlib.import_module().

It suggests looking into Import Hooks: https://www.python.org/dev/peps/pep-0302/

devinrsmith commented 2 years ago

PEP 302 is very old, and I'm assuming still the low level way of how things work. I'd see of PEP 451 is more relevant; and would also see if other libraries are doing it, and if there are any more modern / organized ways of setting it up.

jmao-denver commented 2 years ago

@mofojed @devinrsmith @chipkent, thanks for all the useful info. I tried something really simple with the finder approach Chip discovered, but it doesn't look promising without some further work. What I found out is that when Python process this import statement 'from X.Y import Z' for the first time, the custom finder does get called, but afterwards, any import statements involving X, or Y, or X.Y will not invoke the custom finder again because they are already in the sys.modules, which makes sense from a performance perspective. I haven't tried the tip Bender shared but it most likely will result in the same situation. This means that we can't dynamically add symbols in a module at the time of 'import' after it is already loaded. I am not sure if JNI or Java in general allows the inspection of a Java package to retrieve all of its the public elements. If it does, then we could build up the target module namespace on the fly on the first import.

devinrsmith commented 2 years ago

https://jpype.readthedocs.io/en/latest/userguide.html uses this too - I bet we can look at how they do it to see if the same way would work for us.

devinrsmith commented 2 years ago

https://github.com/jpype-project/jpype/blob/master/jpype/imports.py

Thrameos commented 2 years ago

Please note you will also need something like the support classes under

https://github.com/jpype-project/jpype/tree/master/native/java/org/jpype/pkg

The Python portion that you identified is not difficult, but if you want to get the listing of what should be in the contents of the packages, that requires probing on the Java side.

In Java, you can interrogate a class using reflection to find its contents, but it is much harder to interrogate a package to find out what classes it contains. To find that you have to use a combination of the jar and jrt filesystems. It gets complicated when you consider the transition from Java 8 to Java 9+ and the added complexity of multiple release jars requires additional special cases. There are also rare cases such as jars that don't contain an zip index which are impossible to get a listing of classes from.

There is also a special pattern which is required if you want to determine if a class is public. We used to include private classes in the index, but that leads to lots of security exceptions when probed. They don't make this one easy as the flags for public are after then constant table and there is no index to skip directly to the flags. Thus you have to decode the whole table first.

If you have questions just poke me as I am the main author of JPype and all these support classes that JPype uses. Though it does sound a bit like I am helping a competitor. Though as JPype is currently only Python to Java and not the reverse there may be a good reason to use jpy here. If you are interesting in helping to finish the JPype Java to Python bridge (which makes Python a first class language in Java) I am always looking for help.

devinrsmith commented 2 years ago

I'm definitely interested to learn more about JPype. I believe we researched it years ago, but we needed support both ways. JPype popped up in my RSS reader via https://blog.codecentric.de/en/2021/11/java-classes-python/, and thought it was timely given some of the things we are working on right now. Thanks for the input @Thrameos!

Thrameos commented 2 years ago

Yes JPype is primarily one way. It is possible for Java to call Python but only if Python creates an instance of an Java Interface. Its primary claim to fame is that it presents Java to Python with customizers such that all Java code appears as Python and supports duck typed compatibility for everything down to the buffer level (fast transfer to and from). As my main use is for scientific coding speed and native look were critical. It even does stuff like exposing Java documentation in Python. About the only features it is missing are integration of stub generation and integration of IO such that Java can operate on a Python file concept.

There is an experimental branch called epypj which is a reverse bridge. It uses Java ASM to create Java versions of Python classes, basically synthesizing Java code backed by Python on the fly. It adds all of the Python types as mixins based on their capabilities. So if something is a Python iterable it appears as a Java iterable and so forth. I even set it up so that you can write customizers for specific classes like pyplot to make it more Java native. It does this through a complete Python FFI (Foreign Function Interface) for the Python backend in Java so any C Python code can be called from within Java (assuming the address for the function was registered). Thus it isn't so much allowing access to Python, but making Python appear as Java code (with a bunch of casting because weak Python objects returned in Java must be cast to the type to expose the additional functionality).

Unfortunately I never could get other contributors to help with the challenging part of writing the required test bench and packaging so that epypj can be used in the same way as jpy. The JPype code structure is strictly as a Python module so it isn't really geared to work with something like maven. I can launch a Python shell from within Java call anything in Java from that shell, and can create Python types in Java code and pass them back an forth, but before it can be a release product it would need to comprehensive testing of all the code paths, and with a complete FFI that means many hundreds of potential interactions. My employer also put a damper by refusing to sign the Python contributor agreement so I can't contribute back hooks and improvements that would make the task easier.

The other significant issue is that of memory management. When operating in one direction it is difficult though not impossible to create memory reference loops in which a Java resource is holding a Python resource which is holding a Java resource. These sorts of loops are irresolvable under the JNI and CPython API. Java GC can't call through CPython visitor to discover that something is a circular reference, and CPython can't see Java held Python resources. Thus both language have items which are stuck with positive reference counts. Thus the bidirectional bridge is useable but the restrictions to avoid memory issues require that avoiding storing a resource which is a container in the other language. I spent a great deal of time looking through the protocols for Java RMI to see if there were ways to treat Python reference counting as a remote protocol for the Java GC, but never reached a conclusion.

jmao-denver commented 2 years ago

Here is another JPY-like project, it also supports the syntax that we like. https://github.com/ninia/jep/wiki/Getting-Started

chipkent commented 2 years ago

This talk may be helpful for the implementation. https://www.youtube.com/watch?v=ziC_DlabFto&list=PL2Uw4_HvXqvYeXy8ab7iRHjA-9HiYhRQl&index=37

Thrameos commented 2 years ago

The issue is not the python import system. That is easy (see jpype/imports.py). Simply implement the findspec methods and import your wrapper classes. The challenging part is determining what objects belong in a java imported package. If you stick to only named objects (no wildcards), then just checking for existance and wrapping is fine. If you want wildcards, statics, and enums on import, then you will need some java support classes to probe.