apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.28k stars 3.47k forks source link

[Docs] Link to GeoArrow from canonical extension types docs #35800

Closed wjones127 closed 1 year ago

wjones127 commented 1 year ago

Describe the enhancement requested

The GeoArrow extension types are defined outside the project. I'm not sure whether they will always be governed separately (that seems fine to me). Regardless, it would be nice to point to them from our list of extension types.

Component(s)

Documentation

wjones127 commented 1 year ago

cc @paleolimbot what do you think?

paleolimbot commented 1 year ago

Tagging @jorisvandenbossche since he wrote most of them!

Right now the repo that contains them contains some misleading information and the memory layout has at least one unresolved PR. It's on our TODO list to fix both of those, after which there will be something better to link to (although maybe linking to the work-in-progress would be useful, too).

The canonical extension types didn't exist at the time the spec started...we could also consider adding them to Arrow as a long-term home which might solve some things like governance of the spec.

wjones127 commented 1 year ago

Tagging @jorisvandenbossche since he wrote most of them!

Right. Sorry @jorisvandenbossche I should have known that 🤦

jorisvandenbossche commented 1 year ago

No problem ;)

I think it would be good in general to have some place to list known "community" extension types (in addition to the official "canonical" extension types). I would be fine with listing them on the same page, if we make the distinction clear?

And indeed it would be good to then include the GeoArrow ones in such a list.

As Dewey mentioned, one reason that the GeoArrow ones are not canonical extension types is just that this concept didn't yet exist at the time we started with geoarrow. And that we also didn't do the effort to make them canonical. I am not fully sure if it would be better to do that (already). I think we probably want to stabilize them a bit more (and finish the memory layout discussions). It's also not clear to me if we can keep the spec in a separate repo/github org if we would make it part of the arrow (apache) governance.

paleolimbot commented 1 year ago

I do like the idea of "community" extension types...we have at least one in the R package and it may help showcase/promote the use of them. As Joris noted, the geospatial extension types - at least the list/struct-based ones - aren't at the canonical/vote stage yet (although extension types for WKB, WKT, and/or GeoJSON could be helpful for adoption in ADBC and are fairly uncontroversial).

wjones127 commented 1 year ago

Yes “community extension types” is what I was thinking. IMO if geoarrow is governed by a group of geospatial data experts, that sounds preferable to the general Arrow community :)