drubbo / SparkGIS

GIS extension for SparkSQL
Apache License 2.0
37 stars 14 forks source link

Extract geometry operators into separate library #10

Open dispanser opened 8 years ago

dispanser commented 8 years ago

Came across this project not looking for spark, but for a scala wrapper / library for geometric operations.

Would it make sense to separate the geometry library from the spark integration (UDF, ...) so it could be used separately?

drubbo commented 8 years ago

Well, it should be straightforward to do - one would pop out GisGeometry, conversions and operators, and probably the factory aspect of Geometry itself. The problem wrt to this project would be the dependency aspect, I'm not familiar with the publishing process to Maven Central.

If by "would make sense" you mean you'd need it, I'll tell you that my processing time for this is undefined, regardless of the amount of work involved. If you're in a hurry, feel free to scrap what you need and do whatever you fancy to.

dispanser commented 8 years ago

I'd be willing to put in the effort, but I thought I'd first ask to make sure such a pull request would have a chance of being merged.

To circumvent the maven central problem, this project could just become a multi-module project, and the SparkGIS module depends on the geometry module.

Alternatively, I'd just rip out the parts I need and deploy locally for now, I'm not comfortable with taking your work and publish parts of it as a separate project under my account.

drubbo commented 8 years ago

A multi-module project would be nice indeed. Thanks a lot !

dispanser commented 8 years ago

I've looked into this a little, and I'm a little puzzled by the cyclic dependencies between Geometry + GisGeometry and Geometry + GeometryType

What I'd really like to have in the separated geom library would be Geometry (+ Operators, Conversion, ...) but without any reference to spark -- the only reference related to spark is the annotation:

@SQLUserDefinedType(udt = classOf[GeometryType])

which is most probably essential, but I don't have any clue about spark and how to declare UDTs etc.

You have an idea on how to resolve this?

drubbo commented 8 years ago

Geometry, GeometryType and Functions are spark specific and should remain as they are, mostly. The mess you see is related to implicit conversions and extractors.

The GisGeometry hierarchy shouldn't be aware of gis.Geometry, and their various unapply should accept jts.geom.Geometry (aliased as Geom) instead.

The GeometryOperators can operate in term of GisGeometry.

If the implicit conversion from gis.Geometry to GisGeometry is in scope, everything should keep working as usual.