geoarrow / geoarrow-java

2 stars 0 forks source link

Scope of minimal implementation #1

Open paleolimbot opened 4 months ago

paleolimbot commented 4 months ago

I scraped together all the Java I remembered to put together an implementation for a GeoArrow data type (i.e., a representation of all the memory layouts currently in the spec). I also made a stub of ArrayData (a data type and a collection of buffers).

Other than those, I am not entirely sure of the scope of what GeoArrow in Java should do: things like JTS and the Arrow Java bindings should almost certainly be connected; however, I imagine that those connections should be scoped such that somebody can use this to do something useful without either.

I'm happy to help review any of this or help port parts of other bindings into Java.

cc @jiayuasu

jiayuasu commented 4 months ago

I agree with your suggestion @paleolimbot on adding JTS and Arrow-Java as dependency.

And the implementation should somehow be designed in a way that Sedona can easily re-use it (with slight modification) to generate the native encoding in GeoParquet.

Any opinions? @Kontinuation @zhangfengcdt

Kontinuation commented 4 months ago

We need (1) Arrow schema definition of various geometry types and (2) conversions between arrow and JTS geometry from the minimal implementation. Parquet reading/writing could be handled by Sedona and does not need to be part of the minimal implementation.

paleolimbot commented 4 months ago

Arrow schema definition of various geometry types

Naturally! I assume the output here would be an ArrowType or Field but there would also be org.geoarrow.core.DataType that implements the GeoArrow-specific bits.

conversions between arrow and JTS geometry from the minimal implementation.

Got it! I off the top of my head, this would probably go from arrow ValueVector -> org.geoarrow.core.ArrayData (simpler...a geoarrow type plus the relevant buffers) -> JTS. My intuition would be to implement something like a ArrowJTSVisitor interface that would let the implementation call a Java method for each JTS Geometry in the output (the other option would be to build a Vector or array of JTS Geometry).

should somehow be designed in a way that Sedona can easily re-use it (with slight modification) to generate the native encoding in GeoParquet.

I don't know if you are using Arrow's scanner or not (or heavily relying on JTS geometries internally), but the org.geoarrow.core.ArrayData should let you create something that can be converted to JTS Geometries in the design I had in mind.

I think there will probably another issue with CRS handling. Either we need to implement a way to create PROJJSON in Java or allow some other CRS representation in the GeoArrow specification (or maybe you have this part already from working with GeoParquet). I think consuming PROJJSON is quite a bit simpler but would have to be implemented either way.

msbarry commented 3 months ago

Hello! Would you need to implement a projjson parser as well? Or does one exist for java somewhere else? I'm trying to add geoparquet support to planetiler and am having a hard time finding one.

paleolimbot commented 3 months ago

Technically all we need here is an interface that a CRS-like object can implement that gets us PROJJSON; however, I think this repo would be a good home for PROJJSON <-> WKT2 (although I gather that even WKT2 doesn't have excellent support in Java, although that may have changed).