go-spatial / geom

Geometry interfaces to help drive interoperability within the Go geospatial community
MIT License
168 stars 37 forks source link

types like `type LineString = []Point` #2

Closed paulmach closed 6 years ago

paulmach commented 6 years ago

Thank you starting this project. I agree that there should be interoperability within the Go geospatial community. I'm not really sure what that looks like, but I think a set of defined types that everyone uses is a good start.

I am suggesting/wondering why you didn't choose to do something like:

type Point [2]float64
type MultiPoint []Point

type LineString []Point
type MultiLineString []LineString

type Ring LineString
type Polygon []Ring
type MultiPolygon []Polygon

The reason I like this is because you can do something like this with no type assertion and you still get the safety of knowing it's a Ring vs a MultiPoint vs a LineString:

poly = Polygon{....}
outerRingLength = lonlat.Length(poly[0])

or

mls = MultiLineString{ls}
mls = append(mls, otherLineString)

It feels really natural to me because you can use all the build ins like len, append and [:]. If you want to explicitly convert a LineString to a Ring you can just do it Ring(ls)

I have an attempt in paulmach/orb that went one step further to separate planar/projected types from geo/earth types so that if you did line.Length() it would use the correct math. But that was a pain to work with. One you were forced to copy on project. Two, without generics the code to "crop to bounding box" (which is basically the same in both domains) required copy pasting or slowing it all down by wrapping it in a weird/unnecessary interface.

The demo/version I like now is at paulmach/geo. It has one set of geo.* types and then planar and lonlat sub packages for specific operations that are different, e.g. Area, Length, DistanceFrom, etc. If you have two points p1 and p2 the distances the code would look like:

planar.Distance(p1, p2)
lonlat.Distance(p1, p2)

While it's not type safe, it still reads really well, and building stuff like convexhull doesn't require copied code for the different types. The result is in the same space as the input.

Anyway, this is stuff I've been thinking a lot about lately and from this repo and twitter it seem like you guys have to. Interest to know if you have any thoughts.

ARolek commented 6 years ago

@paulmach Thanks for opening the discussion. I'm familiar with your go.geo and go.geojson packages and I have been planning on reaching out to you as we test these types to get your opinion. You beat me to it ;-)

As @gdey and I have been working on tegola and researching the go spatial community at large it was apparent that a set of interfaces could really help drive interoperability. Converting between various package types is a pain point and essentially the same types are defined again and agin. While working through the first few iterations on this package (and it's still being worked on) the primary goal came down to:

"Help drive interoperability via interfaces without needing to adopt the geom package types."

We set out with this goal so current packages can maintain backwards compatibility, and with the addition of a few methods become interoperable with other packages.

I am suggesting/wondering why you didn't choose to do something like:


type Point [2]float64
type MultiPoint []Point

...


We tried this approach with the initial design but ultimately it failed the objective of the package. The issue does not present itself until you need to start working with a slice of interfaces. In Go, typically you accept interfaces and return types (which implement interfaces). Slices unfortunately don't work this way so if you want to return a slice of Points that would adhere to the `geom` package `Pointer` interface you would need to return a `[]geom.Pointer` and then internally assert back to your own type! 

With this blocker we had to try a different approach which ultimately led us to the current design. By using arrays / slices / multi-dimensional slices of `float64` primitives, we're able to define a set of interfaces that can be leveraged in any package without needing to import the `geom` package. The `geom` package also comes with types which can have additional utility methods (tbd) to make working with that type easier.

Regarding projection systems, that's still a place that I feel needs consideration. We have discussed adding an SRID() method to the base geom.Geometry interface so the point data comes with CRS context. 

We're still very open to suggestions / ideas and want to make this a community effort. I'm not sure if there is a perfect answer, but it's good to know others are also thinking about how to best address this problem. Now that you know more about the design decisions I'm interested in hearing your feedback. 
paulmach commented 6 years ago

Thanks for the reply, I understand the need for a shared interfaces to say for example make an r-tree. The structs would all need to implement something like Bound() [2][2]float64 and it'd be great if that was standard. But why not Bound() geom.Bound why always go back to the [2]float64?

If I were to implement these interfaces in my projects I'd have to create some functions like Verticies() [][2]float64 that would copy my types to the [][2]float64 form. Then I would realize that this is an unnecessary memory copy and then change all my code from my []Point to [][2]float64 which would move me backwards in type safety.

Or, I just realized that there are types in this repo too, you could just use the geom.LineString type directly (since it implements the interfaces), but then why not make it Verticies() geom.LineString and better still make type geom.LineString []geom.Point?

until you need to start working with a slice of interfaces

Can you provide a more detailed example? I understand that converting slices of types to slice of interfaces is a pain, but I don't understand where that comes up in practice when everything is a type.

paulmach commented 6 years ago

For kicks I tried to see how tegola would use the types defined in paulmach/geo. Interestingly this is already happening in tegola/basic so alot of the changes were just renaming the types and going directly to them vs via the interfaces.

The more interesting stuff was in the wkb package. It seems there are basic.* types and wkb.* types that both implement the interfaces in tegola.*. I refactored so wkb decoded into geo.Geometry (an interface) and the associated types directly. Decoding to 1 type is already hinted to in the code comments.

The diff can be found here. There a number of ways to still "cleanup" the code by unexporting the wkb.* types, which are just implementation details now. This would leave a really nice package interface:

func Encode(w io.Writer, bom binary.ByteOrder, geometry geo.Geometry) error
func WKT(geo geoa.Geometry) string
func Decode(r io.Reader) (geo.Geometry, error)
func DecodeBytes(b []byte) (geo.Geometry, error)

I updated the rest of the tegola repo as well and tests passed. Here is the full diff.

With all this I wanted to see if what you guys were doing was possible with the type as defined in paulmach/geo. It seems like yes. With the summary being instead of wkb decoding into special types that implement the interfaces, decode to the types directly.

So how does this impact "interoperability within the Go geospatial community"? I don't think types vs. interfaces makes a difference on if a community is possible. Either way you have to standardize on something, and I would like it to be the "best" thing because that will bring people in. Projects like turf do something similar. They standardize of the GeoJSON types and then have subprojects/directories for all the different things you can do. Something similar could work in go.

ARolek commented 6 years ago

@paulmach thanks for the in depth response. I'm going to break this down into pieces for the response:

... The structs would all need to implement something like Bound() [2][2]float64 and it'd be great if that was standard. But why not Bound() geom.Bound why always go back to the [2]float64

The design objective has been to not force importing the geom package. By going back to primitive structures ([2]float64) no types package needs to be imported.

If I were to implement these interfaces in my projects I'd have to create some functions like Verticies() [][2]float64 that would copy my types to the [][2]float64 form. Then I would realize that this is an unnecessary memory copy and then change all my code from my []Point to [][2]float64 which would move me backwards in type safety.

This demonstrates precisely the objective of the interface strategy. Rather than force types on the package author for interoperability, they can decorate their types to comply with the interfaces. This allows the package author to be interoperable without needing to overhaul their package. If they choose to adopt the core types, they can do that as well, but it would not be required. Regarding type safety, why do you believe this moves you backwards?

Can you provide a more detailed example? I understand that converting slices of types to slice of interfaces is a pain, but I don't understand where that comes up in practice when everything is a type.

Here's a quick example which demonstrates the difficulty of working with interfaces and slices. Since spatial data types are inherently arrays & multidimensional slices the interface strategy breaks down when using custom types. As you mentioned, if everything is a type this issue does not exist.

Interestingly this is already happening in tegola/basic so alot of the changes were just renaming the types and going directly to them vs via the interfaces.

tegola recently reached an end to end implementation at which we point we started reevaluating the type situation. As you have noticed we have basic.* types, wkb.* types and we also have some interfaces in the basic package. The interface strategy was breaking down per the example I posted above, which is causing a lot of unnecessary allocations. The situation is even more revealing when you dig through the the maths package where the types have pretty much been entirely discarded and the packages are working with multi-dimension slices of float64 pairs (and sometimes multi-dimensional slices of Points, to be cleaned up).

With all this I wanted to see if what you guys were doing was possible with the type as defined in paulmach/geo. It seems like yes. With the summary being instead of wkb decoding into special types that implement the interfaces, decode to the types directly.

I appreciate you taking the time to dig through the code and test this implementation. I agree that decoding into special types is unnecessary here.

So how does this impact "interoperability within the Go geospatial community"? I don't think types vs. interfaces makes a difference on if a community is possible.

I believe that ultimately both types and interfaces need to be defined. Without providing a set of interfaces every package that wants to be part of the community must import and use the community types. Some packages may have different implementations, or different types than the community adopted ones. Other packages may discover the community after that package has been authored. By providing both a set of types and interfaces the package author can either leverage the types or decorate their types with the community interfaces without the need to rewrite their package.

This brings us full circle. Since the inherit nature of spatial data types are multi-dimensional slices and since Go and slices of interfaces are a pain to work with, we end up defining the types as multi-dimensional slices of float64 pairs so we can have easy to work with interfaces.

So in summary, if we want to help drive interoperability, do we require the community to adopt a set of types, or allow them to play nicely with other packages via interfaces?

paulmach commented 6 years ago

Thanks again for the thoughtful response, reply below:

no types package needs to be imported

No import is required to implement the interfaces, but an import is required to use them. So you'll have to import the core package at some point. That is why in orb I keep the core super light and have subpackages for complex operations that are imported as needed.

Regarding type safety, why do you believe this moves you backwards?

Not at a first order, but a second order. Sure to copy my data into [][2]float64 isn't a problem. But if I want to remove the need to copy I'll need to store it internally as a [][2]float64. Then when I work with it internally I lose the type safety.

The situation is even more revealing when you dig through the the maths package

I didn't see anything in tegola that would benefit from using interfaces over concrete types, shared by the community or basic.*. I guess some new stuff got merged into master a few hours ago, haven't looked at that yet.

I believe that ultimately both types and interfaces need to be defined.

Yes, but a "path/line" is not defined by an interface with a function returning [][2]float64 but by the type type LineString []Point. The types and interfaces as currently defined blur this.

You will totally need interfaces, but not to define the core types. Here are examples where you will absolutely need to use interfaces:

However, there are many operations that don't require interfaces and just the types like simplification, finding the area, intersecting polygon, etc. basically anything that works on the geometry alone. Using the shared interfaces as currently defined, they would accept the interface or [][2]float64. Both of which can be improved upon, in my opinion.

So in summary, if we want to help drive interoperability, do we require the community to adopt a set of types, or allow them to play nicely with other packages via interfaces?

I think we're going to have to agree to disagree here.

As for me I think I will continue to work on orb. Going from two sets of types to one was a good tradeoff and I think the api reads nice now so I'm going to try and keep the api stable and just add new subpackages as I need them. Thanks again for taking the time to respond.

gdey commented 6 years ago

Closing, revisit if new information becomes available.