OSGeo / PROJ-JNI

Java Native Interface for PROJ
https://osgeo.github.io/PROJ-JNI/
MIT License
23 stars 15 forks source link

Build native for multiple platforms #40

Open willcohen opened 3 years ago

willcohen commented 3 years ago

I started to work through this a little this morning.

@desruisseaux it looks like you went down this path before, per commit 3a2a611fa884e2f67226386d0c22bea92043fb52: "Abandon the attempt to provide native files for different platforms in the same JAR file." Was the reason due to size?

Observations so far:

willcohen commented 3 years ago

Additional question -- to simplify the compilation/linking of all this glue code, are you all open to considering jumping to use Panama's native interoperability? I think since JDK16 it's gotten enough memory access functions that it works reasonably well, and JDK 17 LTS is coming out rather soon. I can try poking around and seeing if it works at all (via jextract etc). I suspect it'd dramatically simplify the code base if it works since I think we'd be able to excise all the internal C++ code, but it'd mean that the minimum JDK would probably be 16/17 and not 11.

desruisseaux commented 3 years ago

It was for 3 reasons, with file size indeed one of them:

We can mitigate the file size issue by providing the Java code in a single JAR file and the JNI bindings in separated files for each platform. This is the strategy applied by JavaFX for example, using a custom Maven plugin for selecting automatically the right JAR file.

desruisseaux commented 3 years ago

Yes I thought about Panama too. But it is still in incubation even in JDK 17. Maybe more important, in my understanding the current Panama version works with C only, not yet with C++, Fortran, etc. Compatibility with C++ is on their "to do" list, but for a future version. PROJ-JNI code relies extensively on PROJ C++ API and uses a lot of features not available through PROJ C API. It also have some tricky code managing interaction between C++ "smart pointers" and Java garbage collector, and I do not know if fhat functionality would be easy or not to reproduce in Panama.

willcohen commented 2 years ago

As a quick update here: I've been messing around with this some more, using dtype-next. I still don't have anything clean enough to show in a repo just yet, but this looks like it'll be able to help with the native libraries problem. Pre JDK-17, it uses jna, and for JDK-17+, it uses Panama, can switch between them pretty seamlessly, AND is able to output a Java API rather than force everyone into Clojure. (edit: it also supports Graal, though I haven't quite gotten that working yet here!)

My one request would be that PROJ-JNI hold off on deploying anything to Maven Central in the meantime while I mess around with this a little more -- if this works the way I think it does, this might end up being the most maintenance-free way to get cross-platform native bindings to the JVM as an option (it can also draw from the system library path, of course, so I think there could be -slim variants that don't package anything specifically, to avoid the size issue) with low overhead.

willcohen commented 2 years ago

I still need to clean up the proof-of-concept into a workable repo, but it definitely works! REPL output here:

proj.api> (def p1 (proj-coord-array {:n 1}))
;; => #'proj.api/p1
proj.api> p1
;; => [{:x 0.0, :y 0.0, :z 0.0, :t 0.0}]
proj.api> (native/proj_trans_array (proj-create-crs-to-crs {:source-crs "EPSG:4326" :target-crs "EPSG:3586"}) 1 1 p1)
;; => 0
proj.api> p1
;; => [{:x 3.0250865971411645E7, :y -610981.481754199, :z 0.0, :t 0.0}]

It first initializes an array containing one proj_coord (well, an array with one struct with four doubles created externally) and calls it p1, creates a crs using proj_create_crs_to_crs, and then passes both to proj_trans_array. Everything is running in the JVM except for the original proj lib. It works on Panama for JDK17+, and can fall back to JNA in all other cases.

The main issue is I haven't currently figured out how to deal with pass-by-value functions (hence the need for proj_trans_array, which can take the pointer, versus proj_trans), so this may not immediately be able to mirror the entire API. However, this means that a jar can contain X number of precompiled proj libraries for other platforms and it should work without needing an additional compilation step!

desruisseaux commented 2 years ago

Is it mirroring the PROJ C API? The C API is specific to PROJ, while the C++ API is very close to the model of ISO 19111 international standard.

willcohen commented 2 years ago

C, though I see no reason the bulk of the C portion of the ISO functionality couldn’t be implemented fairly quickly (@kbevers I suspect I’m entering a minefield here!)

Re smart pointers, the good part about dtype-next is that it has a clean method (pardon the Clojure) for attaching the disposal functions to the pointer objects at the time of creation, so when the JVM decides it’s done with the pointer on the JVM side it’ll call back to PROJ to dispose, or if it’s otherwise allocated memory it can free it too.

There’s definitely a few issues left to work through, but it does seem like there’s some real upsides for sure! It may be that for the most C++-API-like approach, PROJ-JNI is still the best bet, but for the most portable JVM solution, this is another path.

desruisseaux commented 2 years ago

It is not a minefield, it just depends on the goal. The C API provides relatively opaque objects for CRS definitions and coordinate operations. It is possible to get some information like ellipsoid axis lengths, but not as detailed and unambiguous as what the C++ API allows. On the other side, the C++ API maps ISO 19111 almost fully.

It may be a trade-off between leveraging the power of ISO 19111 or avoiding the need to compile JNI locally. If we can not have both of them in same time, we may consider if we want the two approaches to coexist and how.

willcohen commented 2 years ago

That actually makes a lot of sense in distinguishing the use cases. In a situation detail-oriented enough to want the unambiguity of full compatibility with the spec, maybe it’s not that much extra work having one library installed locally and doing a little legwork to get this compiled for those needs.

willcohen commented 2 years ago

@kbevers @desruisseaux

Just a quick update here as I keep working through this. The one major remaining task I set for myself before trying to post a first version of a working repo based on https://github.com/OSGeo/PROJ-JNI/issues/40#issuecomment-1012679044 was to figure out fallbacks for users on platforms where the jar doesn't include a natively-compiled PROJ library.

My current thought process is the following, and I'd love some feedback on it:

/* @type {function(...):?} */ var _proj_xy_dist = Module["_proj_xy_dist"] = createExportWrapper("proj_xy_dist");

/* @type {function(...):?} */ var _proj_trans_array = Module["_proj_trans_array"] = createExportWrapper("proj_trans_array");

/* @type {function(...):?} */ var _proj_create_crs_to_crs = Module["_proj_create_crs_to_crs"] = createExportWrapper("proj_create_crs_to_crs");

/* @type {function(...):?} */ var _proj_context_get_database_path = Module["_proj_context_get_database_path"] = createExportWrapper("proj_context_get_database_path");


- This would then mean that if there's no binary version of PROJ, then there's conceivably still a way that those users could run the WASM version on the JVM.

Perhaps more importantly, a WASM build into the mix actually means that the clojure wrapper would then be able to serve double duty.
The `.clj` version of the wrapper would either call natively to the correct compiled version of proj, or fall back to calling the WASM version for other-platform users of the JVM. I'd create a Java API to obscure all the clojure stuff so that any JVM user should be able to reference the various C interface functions accordingly via the various Panama/JNA/etc paths.

Since clojure also targets javascript, this would then open the door to having an analogous clojurescript `.cljs` interface to just the WASM version. I would also then want to target a Javascript API that similarly references all the C functions of PROJ, accessed via the WASM version, and I guess that could become an npm library too.

Assuming this works the way I think it will, this'd mean that there'd be a way to have both JVM and JS ecosystems have access to the upstream version of PROJ as-native-as-possible, without needing to deal with C or the native compilation.

The substantial downside, though, as noted above is that this stays C only in terms of interoperability, so all of the C++ API wouldn't be present for either of these two options.
desruisseaux commented 2 years ago

I think it would be helpful to have the two parts of the work as two separated branches:

For the first branch, we can reduce the size of the JAR files by splitting them as below:

The pom.xml file on JAR files for the Linux binaries would look like (simplified):

<groupId>org.osgeo</groupId>
<artifactId>proj-bin</artifactId>
<version>1.0-SNAPSHOT</version>
<plugins>
  <plugin>
    <artifactId>maven-jar-plugin</artifactId>
    <configuration>
      <classifier>linux</classifier>
    </configuration>
  </plugin>
</plugins>

For other platforms, we would use the same pom.xml with only a different value in the <classifier> element. Then in the main pom.xml (the one for the pure Java code), we could have something like:

<dependency>
  <groupId>org.osgeo</groupId>
  <artifactId>proj-bin</artifactId>
  <version>${project.version}</version>
  <classifier>${platform}</classifier>
</dependency>

<profiles>
  <profile>
    <id>linux</id>
    <activation>
      <os>
        <family>unix</family>
      </os>
    </activation>
    <properties>
      <platform>linux</platform>
    </properties>
  </profile>
  <!-- Same for Windows, MacOS, etc. -->
</profiles>

An example is provided in Nexus Tips and Tricks section 5.5.3 (Platform Classifiers). JavaFX use a similar technique, but using a custom Maven plugin instead of Maven classifiers.

With this approach, users would download only the JAR file for their platform and we would not have to restrain the number of supported platforms because of file size concerns. (I wonder however who can manage to build a JAR for each platform…)

desruisseaux commented 2 years ago

One more thing: the pure Java code is under MIT license, but the proj-bin JAR file containing PROJ binary, if it includes the EPSG database, will have to be under MIT + EPSG terms of use license.

willcohen commented 2 years ago

Makes sense. I'll see if I can figure out the maven method to help point to the right jar. In terms of building, I've been able to get Windows + Mac working via a script, and I can get a bunch of linux architectures to compile with dockcross. The eventual plan, I think, would be to have GitHub Actions (which does support running on a Mac builder and a Windows one) to use the Mac instance to build Mac natively + Linux via dockcross for as many architectures as possible, and Windows natively as well.

desruisseaux commented 2 years ago

I added a wiki page for a Maven project layout proposal:

https://github.com/OSGeo/PROJ-JNI/wiki/LayoutProposal

Please feel free to edit.

willcohen commented 2 years ago

Hi all. As a quick followup, it took me much longer than I expected to get sqlite and libtiff to link correctly but I've successfully built a working PROJ 9.0.1 with webassembly using emscripten. This means that for platforms that don't have access to a built binary, it'll be possible to fallback to a JS build of PROJ, and once I get the wasm fully working in pure js, then it should work to use graaljs -- which to my knowledge is pure java -- meaning that native (well, transpiled) proj should work for anyone on the JVM.

Here I allocate an array of one coordinate and transform it via webassembly:

image

In the next few weeks I'll try to get this working prototype posted. There's still a little more cleanup to do!

@hobu

Edit: it works as a prototype on graaljs too, if a little slowly, since I haven't yet figured out how to import the sqlite db into the emscripten filesystem with a graaljs context rather than embedding them in the js itself:

(eval context
    "var o1 = _malloc(32);
     var p1 = Module.HEAPF64.subarray(o1/8, o1/8 + 4);
     var t1 = ccall('proj_create_crs_to_crs','number',['number','string','string','number'],[_proj_context_create(),'EPSG:3586','EPSG:4326',0]);
     ccall('proj_trans_array','number',['number','number','number','number'], [t1, 1, 1, o1]);
     p1")
;; => #object[org.graalvm.polyglot.Value 0x497d8c6f "Float64Array(4)[34.24438675300125, -73.6513909034731, 1.0609979113e-314, 1.600083993565264e-303]"]
desruisseaux commented 2 years ago

Hello @willcohen. Given that WebAssembly is language-neutral and not particularly related to Java, should this work be in a new project, something like "PROJ-WASM"?