Closed SimonDanisch closed 4 years ago
Also, it looks good, nice work :-)
in for isinside?
I just really love the syntax point in geometry
, and I don't think it's dangerously ambigious, as long as we don't subtype it as AbstractVector. It seems intuitive to me, that one would do point in vertices(poly)
to check if the point is actually part of the vertices.
does a polygon need to know if it’s a hole?
I guess not, I was just still inspired by the StructArray method and thought that would be the clear path forward. But actually, best argument against it would be, that one has a non-hole polygon, and tries to insert it into a multipolygon as a hole - that'd be pretty inconvenient! I'll change it
does a polygon need to know if it’s a hole?
A case for this is because the simple feature standard defines polygons to support holes, taken from https://r-spatial.github.io/sf/articles/sf1.html
POLYGON geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring
So if we want to read simple features polygons from GeoJSON.jl or GDAL.jl, we cannot convert it to GeometryTypes polygons if there are holes.
No the idea would be to convert it to a MultiPolygon
, that's what that type is for. It allows multipart polygons at the same time. Would that be a problem?
I even think both MultiPolygon and Polygon could inherit from AbstractPolygon and have the same methods defined.
I just really love the syntax
Fair enough, it is nice. point inside polygon
would be nice too though.
we don't subtype it as AbstractVector
With "it" you mean the polygon? Because AbstractPoint <: StaticVector
right?
I do see the point in adhering to an ISO standard if it doesn't come at a cost in other contexts though. @visr would it be possible to implement geometries in accordance with the simple features standard in a way that is consistent with the GeometryTypes classes, and which relies could take advantage of the work you've already done in the Geo packages?
Yeah I'm not sure about it. Just wanted to point out that if we deviate from it here already, it will make interoperability more difficult. I don't think we can fix it by using multipolygons or this, since that is just a vector of polyons as a single feature.
If we can easily make sure this mostly works with simple features it's worth keeping an eye on that.
cc @yeesian
But FWIW I don't think we need to follow the same implementation details, as long as the types circumscribe the same information. Remember that SF had to be expressible in a very non-expressive language such as R. I mean wouldn't it be perfectly consistent with
struct ComplexPolygon <: AbstractPolygon
poly::Polygon
holes::Vector{Polygon}
end
struct MultiPolygon{T<:Union{ComplexPolygon, Polygon}} <: AbstractPolygon
polys::Vector{T}
bbox::Rect
end
etc
Thanks for pushing on this Simon and Michael!
does a polygon need to know if it’s a hole?
I guess not, I was just still inspired by the StructArray method and thought that would be the clear path forward. But actually, best argument against it would be, that one has a non-hole polygon, and tries to insert it into a multipolygon as a hole - that'd be pretty inconvenient! I'll change it
Yeah it feels more natural to me for the MultiPolygon to know whether a given Polygon is a hole or not, than for a Polygon to know whether or not it's the hole of a MultiPolygon.
But FWIW I don't think we need to follow the same implementation details, as long as the types circumscribe the same information. [...] I mean wouldn't it be perfectly consistent with
struct ComplexPolygon <: AbstractPolygon poly::Polygon holes::Vector{Polygon} end struct MultiPolygon{T<:Union{ComplexPolygon, Polygon}} <: AbstractPolygon polys::Vector{T} bbox::Rect end
etc
Why not go one step further and just say that Polygons have to implement a holes()
method, like
struct MultiPolygon{T, A <: AbstractVector{<: Polygon{T}}}
polygons::A
boundingbox::HyperRectangle{2, T}
end
holes(mp::MultiPolygon) = (p for p in mp.polygons if ishole(p))
struct ComplexPolygon{A <: AbstractVector{<: Polygon{T}}}
poly::Polygon
holes::A
end
holes(cp::ComplexPolygon) = cp.holes
Then people can define whatever Polygons they want, and you can define in()
as:
function in(point::AbstractPoint, polygons)
for polygon in holes(polygons)
point in polygon && return false
end
# [...]
end
Btw I'm not sure the current implementation of in()
for multipolygons is entirely correct. You might want to be careful in testing it first.
Btw it's not a deal-breaker, but it seems to me the current implementation of Polygon can't seem to decide whether it wants to be a Shapefile.Polygon or GeoJSON.Polygon or LinearRing or its own thing. I'll elaborate later when I find the time.
Update: I'm going to postpone the discussion on this until after we agree upon a common geometry model in https://github.com/JuliaGeometry/GeometryTypes.jl/pull/166#issuecomment-460484813, since I imagine we want to see something like
Polygon <: ... <: AbstractGeometry
at some point.
It wants to be all of these IMHO.
Why not go one step further and just say that Polygons have to implement a holes() method,
makes sense
A few refactoring ideas:
Cool. I think I agree with a lot of what @yeesian has written already. I see this takes the Shapefile approach - the "multipolygon" we've been using is a collection of complex polygons.
(As a side note, this makes me feel that we should really share the code we've written...)
The end goal here is to have a uniform and lightweight representation for geometries throughout the ecosystem. Compasability is a goal, and it would be nice to do as much as possible with concrete types as 1) they shouldn't be too controversial for this case, 2) it's more robust and 3) it will avoid copying/conversions. So for sure it should be at least immediately compatible with Shapefiles, though not chained to that if that causes problems somewhere else.
we should really share the code we've written...
Now would seem like a good time to do it, I do agree :-)
refactoring ideas:
I don't care so much but my view is that most operations could stay in this package in a separate files, since they rely on the types here and don't involve further dependencies. Also the names shouldn't conflict with names elsewhere. But I'm curious what it would take for @andyferris to think of this as basic enough?
I'd like for us to move towards a single model of geometries that we can all work with [1]. It doesn't have to be comprehensive, but it has to be agreed upon. I'll like for such a definition to be sufficiently straightforward that something like Polygon <: ... <: AbstractGeometry
is made possible.
We can add more methods to it as we go, but I want to be prudent/conservative in terms of what is required (versus nice-to-have). I'll be happy to flesh out a more detailed description of such a geometry type if we are all in agreement.
Without further ado: may I propose the following starting point for all of us?
We start with a single AbstractGeometry
type. Any subtype of an AbstractGeometry
is required to implement
eltype(::AbstractGeometry)
ndims(::AbstractGeometry)
copy(::AbstractGeometry)
I have some remarks (based on my reading of the current GeometryTypes codebase):
For some classes of geometries (e.g. circles), it is unclear what vertices()
and nvertices()
should mean, so I left them out of the basic definition. (Even in those cases where it is appropriate, I would prefer something like ncomponents(geometry)
, component(geometry, i)
and components(geometry)
instead.) But someone can go ahead and implement a subtype like AbstractIterableGeometry <: AbstractGeometry
for it.
In addition to iterable geometries (c.f. point 1), there can also be a AbstractMutableIterableGeometry <: AbstractIterableGeometry
which requires methods like push!(geometry, point)
and deleteat!(geometry, i)
.
Someone can go ahead and implement a parametric subtype that looks like e.g. AbstractParametricGeometry{T,N} <: AbstractGeometry
. But it is not required for AbstractGeometry
. I am not comfortable about committing to a parametric abstract type too early in case it makes it too onerous for wrappers of opaque C objects.
@SimonDanisch @mkborregaard @andyferris @visr (and anyone else following this thread with a vested interest in its outcome, thumbs up to this comment if it does not pose a problem for your model of geometries. Otherwise, provide a description of a case where it might be problematic.)
[1]: My personal preference is for us to move towards a trait-like interface of working with geometries. But my understanding from various discussions is that there is still reluctance about it, and greater enthusiasm for building a unified type hierarchy of geometries.
All of the discussion above is based on my reading of AbstractGeometry
, GeometryPrimitive
, and AbstractFlexibleGeometry
. Since GeometryPrimitive
is a subtype of AbstractGeometry, I'm going to leave it out of the rest of the discussion in this comment.
For AbstractGeometry
, I observe the following methods:
eltype(geometry)
ndims(geometry)
nvertices(geometry)
decompose(geometry)
is not crucial to our desired definition of a geometry)For AbstractFlexibleGeometry
, I observe the following methods:
eltype(geometry)
vertices(geometry)
nvertices(geometry)
(synonymous with Base.length(geometry)
)push!(geometry, point)
deleteat!(geometry, i)
copy(geometry)
My guess from the docstring of AbstractFlexibleGeometry{G}
and implementations of current subtypes of AbstractFlexibleGeometry is that
AbstractFlexibleGeometry{G} ≈ AbstractVector{G} where G <: AbstractGeometry
Therefore, the current type hierarchy is meant to be something like
AbstractGeometry
AbstractImmutableGeometry <: AbstractGeometry
AbstractMutableGeometry <: AbstractGeometry
with the following set of methods:
eltype(::AbstractGeometry)
ndims(::AbstractGeometry)
vertices(::AbstractGeometry)
nvertices(::AbstractGeometry)
copy(::AbstractGeometry)
push!(::AbstractMutableGeometry, point)
deleteat!(::AbstractMutableGeometry, i)
Is that a reasonable reading? What might I be missing?
Thanks for writing it up @yeesian! Regarding:
[1]: My personal preference is for us to move towards a trait-like interface of working with geometries. But my understanding from various discussions is that there is still reluctance about it, and greater enthusiasm for building a unified type hierarchy of geometries.
It might be good to list some pros and cons of traits vs type hierarchies for this purpose. I'm more familiar with type hierarchies than traits, so that's what I mainly spoke about. But the success of Tables.jl in providing a common interface to many different table-like types might be transferred to geometries?
So the way I see it there are three possible levels of integration: shared concrete types, abstract types with an informal interface, and traits.
The advantage with concrete types is simplicity the way I see it. Everybody writes functions for the same types, it's easy to reason about what a type contains in a function, and it's thus much easier to write code that is optimised for performance. The issue with abstractions is that they may hide a non-performant implementation, which is exactly why specialized methods throughout the ecosystem tend to be faster. Another advantage is that it avoids copying because you never have to convert types. Another advantage is that if using the same basic concrete types becomes widespread throughout the ecosystem, you will have packages working together without ever having to align anything, or even plan for it. For instance if all plotting packages used these basic types it would be possible to plot shapefiles with any package without needing recipes.
The big disadvantage of course is that you have to agree on the type implementation, so it must be possible to define an implementation that is actually ideal for all purposes; also refactors can be much more complicated if you don't make sure to still code mainly with abstractions such as getter functions.
The advantage of abstract interfaces is that it allows you to have different types and still code to them under the same logic. This is extremely useful if you need to have different types. While abstractions are a lot easier to agree on and implement throughout the ecosystem, it also comes at the cost of unclarity. Say I load a shapefile with Shapefiles which is just julia Vectors of Points, and one with GDAL which wraps a pointer to a GDAL object, then uses LibgGEOS to intersect them - what type of Polygon should I then have? Do I end up with a mix of pure-Julia objects and C pointers etc in my workspace? And how many implicit copies happened by conversion in the process? (Let me know if I misunderstand any of the relationships here).
To be clear what I suggest is having both - i.e. to have an abstract interface, but also attempt to use the same concrete types where possible.
The advantage of traits is that it gives you many of the same advantages as an abstract interface but 1) it allows you to add traits to a type after it has been created (you can't really add a non-package-specific trait to a type created in a different package though, that would be type piracy), and it does not interfere with the existing type hierarchy, i.e. it allows you to have a semblance of multiple inheritance. It's also less simple, and IMHO clunkier to program with than an abstract interface.
Traits are great exactly for something like Tables, because it is an explicit design goal of the Tables ecosystem that there should be many types of Table. Some types are faster for row access, some for columns access, some for big persistent data sets in many different files, some for type stability etc etc. The idea is that the user should choose exactly the type of Table that is best for his/her use case, then be able to apply all the same operations on it. To my mind though that's a very different situation from the Geo case, where we need coherence more than pluralism imho.
I do have a question about design though, that relates to Geos and GDAL etc., given that they are the core of the geographical ecosystem in many languages. Are the C++ objects they use the same, or interchangeable? Or do you need converting? So given that there is already a huge C++-based geography ecosystem, I see the point of letting that do all the work, and having Julia types simply wrap pointers that they pass around, then applying all functions as translations to the C++ interface and passing to the libraries. In this way Julia is used as a scripting language in exactly the same way as Python or R, and behaving the same way. In principle the implementation and experience could be the same, or it could be different if you guys have ideas that are better than the python developers out there (I bet you do). It would be a gated community, e.g. you convert everything to C++ pointers at the door then keep passing those objects around.
That's fine - julia is perfectly great as a scripting language. But it also has more powerful aspects - so then the question is - is there also space for a (simpler) Julia representation, , with julia types and deep integration with the rest of the ecosystem? I think so, which would be what we would try to build here. The ideal would be if these julia types could use geos and gdal natively, and I think I understood from Simon that it might be possible to define the julia types so they end up with the same memory representation as C types, but it's unclear to me how feasible this would be.
So maybe the most feasible other possibility is to have one ecosystem based on concrete Julia types, another (like what you already have) wrapping the same C++ types (if all the c++ objects are compatible), all connected by an abstract interface?
Thanks for putting it all together @mkborregaard! It is indeed what I had been mulling over.
Everyone (myself included) would prefer to not have to work with opaque objects "hidden" through API calls. (It is possible that GDAL objects one day might become C structs in their C API: we can then work with them more natively through Julia. Or Julia might have tighter integration with Cpp and other solutions for working with them might become possible. In any case, these scenarios have not happened.) But there are also lots of advantages to rely on and contribute back to a broader ecosystem based upon common C++ geospatial libraries. Again, most of these libraries are MIT-licensed, so there's really nothing stopping anyone from doing a port of these libraries to Julia.
But practically, what are the benefits of having a single concrete Julia geometry type? There are some that already exist in GeometricalPredicates.jl. Apart from disagreements over what such a geometry should look like, they commonly lack comprehensive functionality. (What is the point of reading in a GDAL geometry, and then constructing a Julia geometry type from it, just to realize that the Julia geometry doesn't support a particular operation, and then constructing another GDAL geometry so that you can call a GDAL function, and then constructing back another Julia geometry as the output of that operation, etc. It's just tons of unnecessary back-and-forth.) Effort will need to go into implementing (and maintaining) what was implemented and is currently being maintained for the C++ libraries. From what I've seen of other similar OSS, these are very seldom community-driven, and almost always borne of a single (or a few) developer(s). It is true that they will be easier to implement/maintain in Julia -- but who's willing to do that work, and who will be it funded/supported by?
The way I see it, it is possible to build Julia packages on top of existing C/C++ libraries (e.g. the JuliaOpt ecosystem) that are performant and easy to use. Whether or not anyone wants to take it upon themselves to replace those C/C++ libraries with their own packages is their own crusade. And as I see it, the way to future-proof Plots.jl or other packages that wants to work with geometries is to provide an interface for interacting with geometries (regardless of their specific implementation/structure). Which I think is what this discussion is all about. From that perspective, insisting that we have a single concrete julia geometry is a step backwards.
A single concrete geometry type is sadly unworkable. I'll be in favor for a shallow inheritance tree that other packages can use, with a few concrete implementations, and while that works, we can start hardening our interfaces defined on the abstract types and spice them up with traits ;)
So packages that are free in their types can use the concrete implementations, and C libraries with weird stuff happening should define the interface & some conversion functions to make interop seamless, but also make it easy to directly work with the c-types (or especially C++).
I do have a question about design though, that relates to Geos and GDAL etc., given that they are the core of the geographical ecosystem in many languages. Are the C++ objects they use the same, or interchangeable? Or do you need converting?
Currently conversion happens internally (via wkb) in the GDAL library for GEOS operations. It's not ideal, but I've heard the argument (don't rmb where) that it is a cheap operation.
A single concrete geometry type is sadly unworkable.
Could you elaborate on this point? What is the inheritance tree you're suggesting?
well the points from @yeesian ;) Even though it'd would be wonderfull to have one type that can rule them all, we can't really plan with it since there will definitely reasons for different layouts. Buy shallow, I mean just containing as few subtypes as possible ;)
OK but then I don't think we disagree. The idea that it sounds like we're all converging on is to have 1) an ecosystem based around the C libraries that try to be consistent without copying among all the C libraries and 2) simple Julia-based implementations of a subset of the functionality, like Shapefiles.jl, that try to use the same concrete geometry that people like me who might go on a "crusade" to port clipper could work with, and both of these satisfying the same abstract interface, using traits where that is most convenient? And with good conversion functions but a philosophy to mostly not have to convert.
That sounds perfectly in sync with what I meant ;)
I don't know GeometryTypes very well but to me @yeesian 's scetch of a type hierarchy sounds reasonable. Philosophically I'm inclined to try to make it as simple as possible.
I mean, the simplest possible thing to do is to just define a
abstract type AbstractGeometry end
and to let other package developers interpret what they will of it, haha.
Multiple subtype hierarchies might exist and nothing is guaranteed to work across subtype hierarchies, but that can be left for subsequent discussions to resolve rather than holding up this PR. (My proposal was just to kickstart that conversation, but it doesn't have to happen here.)
I think we should have at least
AbstractPoint <: AbstractGeometry
AbstractLineSegment <: AbstractGeometry
AbstractPolygon <: AbstractGeometry
might perhaps also be good to have
AbstractComplexPolygon <: AbstractPolygon
AbstractMultiPolygon <: AbstractFlexibleGeometry{<:AbstractPolygon}
AbstractMultiLine <: AbstractFlexibleGeometry{<:AbstractLineSegment}
Not wedded to the design just trying to circumscribe some types. I agree with @yeesian that it's only necessary to have parametric abstract types when it is needed in order to write specialized functions using the interface.
(In case it is of interest, there's now some polygon code (and more) at https://github.com/FugroRoames/RoamesGeometry.jl)
Nice discussion.
I do tend to agree with the above that a package defining an abstract type hierarchy for geometry, with a few simple Julia concrete types, is the way to go. We just need to flesh out the abstract interface functions we use to interact with such geometries to the correct level of granularity.
A couple of points on the abstract interface functions - if we use functions from Base
, they have to do what they mean in Base
:
eltype
is the one that tells you the type for in
and iterate
and so-on. To me, an ND geometry is a set of ND points. I would say the eltype
of a Sphere
, say, is some kind of point, and I would frequently query if point in sphere ... end
.ndims
is an AbstractArray
interface function - beyond that a library writer might extend it to describe how getindex
works (for a higher-dimensional dictionary, say). Let's please not overload it, since this will be a mess. If a 3-vector is 3D point, we still have ndims(::AbstractVector) = 1
.Finally, I find that when I want to mutate a geometry, I tend to know its concrete type. If you want to push!
points to a linestring, say, then I'd just push!(point, linestring.vertices)
(or else push!(point, vertices(linestring))
if we decide to have a vertices(::AbstractLineString)
interface function). So I wouldn't spend effort adding an abstract type heirarchy for mutability when a few traits and abstract getters will do the job well.
Good points!
This is a detail but just to check (as it is relevant for the PR): Would you expect if point in sphere == true
if the point was inside the sphere, or if it were on the edge? Also, is a requirement of in
that you can do for point in sphere...
which is undefined here?
Answers to my own questions. 1 you mean within, like in the pr. 2. Is not a problem, as sphere does not need to be iterable. Nice package, really like the structure
@SimonDanisch @mkborregaard is this ready to merge / are there any TODOs that we could pick up? @visr et al were considering moving over to GeometryTypes once it's ready.
Yeah I wonder. I see that there is a Polygon in GeometryBasics.jl. Should we integrate with that package or do you think it will still be merged into this one?
I plan to switch everything to GeometryBasics :) Maybe we can make it feature complete together? It's already in a pretty good shape, but still needs a few algorithms copy & pasted over from GeometryTypes & more tests and docs!
Ok good to know. In that case I guess we best try hopping on the basic bandwagon straightaway and see what we run into ;)
Ok don't hesitate to spam me with questions ;) I'd love to get this all to work nicely together!
I would like to implement this paper for fast point-in-poly.
Do we want to move GeometryBasics to this org?
@sjkelly I moved it! :)
so awesome - action on this PR :-)
I'm going to merge this, even though its incomplete :D I still plan to move the "real refactor" into GeometryBasics.
I see you’ve merged and refactored. Questions:
in
forisinside
? That would prevent us from exporting the function. And if we make it a method ofBase.in
that is an ambiguous use of the function I'd say. You might intuitively expect that to meanin(pt, vertices(polygon))
.Plan to factor out colortypes too?