georust / geozero

Zero-Copy reading and writing of geospatial data.
Apache License 2.0
321 stars 30 forks source link

Clarification of `GeomProcessor` methods #182

Closed kylebarron closed 4 months ago

kylebarron commented 6 months ago

The documentation for the methods in the GeomProcessor trait is kinda sparse and that makes it hard to know what order of traits is guaranteed from data producers.

I'm trying to flesh out support in geoarrow for conversion from geozero to geoarrow arrays. Up until now, I've made the user choose a geometry trait when importing the trait, e.g. ToPolygonArray, and to some extent assume the inputs will indeed be polygons.

Now I'm trying to implement support from arbitrary-typed or mixed type items (see https://github.com/geoarrow/geoarrow-rs/pull/304), which means storing a piece of internal state for each geometry input so that when xy() is called, I know which geometry type that's associated with, and thus which internal geometry array to append to.

But one thing that isn't documented is when each method gets called, when it's a part of a larger geometry. So for example in the GEOS reader, point_begin gets called for a point type https://github.com/georust/geozero/blob/8501fedf288adc851d08a367f8f651ee643d4bed/geozero/src/geos/geos_reader.rs#L49 but not for a MultiPoint type or any other type. But linestring_begin gets called for all of line string, multi line string, polygon, and multi polygon types!

I guess this is just a feature request to flesh out these docstrings? It's hard to ensure conformance against the traits only by reading the code.

nyurik commented 6 months ago

@kylebarron valid point. I don't have an answer to your question, but a suggestion on how we can all chip in with this - could you start a pull request to improve the doc strings, adding a few of the ones you have figured out, and then the reviewers can make suggestions/changes to that? This way we can all collaborate on it before merging.