developmentseed / lonboard

A Python library for fast, interactive geospatial vector data visualization in Jupyter.
https://developmentseed.org/lonboard/latest/
MIT License
522 stars 27 forks source link

[EPIC] Enable visualization of cloud-hosted geoparquet datasets #314

Open emmalu opened 6 months ago

emmalu commented 6 months ago

Context

Geoparquet is a relatively new but powerful OGC standard for distributing and visualizing vector data at scale. We can advocate a strong case for why and how all users can leverage this data format, including through Lonboard.

Issue

Enable the visualization of cloud-hosted geoparquet datasets.

Acceptance-Criteria

These are the tasks that need to be completed or artifacts that need to be produced.

kylebarron commented 6 months ago

Other possibilities with lonboard specifically are visualizing a cloud-hosted geoparquet dataset. That would only really be possible with the upcoming bounding box definition in GeoParquet 1.1 (https://github.com/opengeospatial/geoparquet/pull/191), so is more of a longer-term goal.

Otherwise much of the tasks here are focused on the JS API, and we may or may not wish to track those inside of the lonboard repo

emmalu commented 6 months ago

Understood, thanks for the corrections @kylebarron. I've updated the issue and left room to expand if/when #191 is complete.

kylebarron commented 6 months ago

I think there's an element of this that may be useful in Python too.

In particular, in geoarrow-rs, we can have a ParquetFile and/or ParquetDataset class for single-file and multi-file parquet data. Then when scanning the metadata (especially with GeoParquet 1.1) we can have a method like "get boxes" where we materialize a table of metadata with each row group and its bounding box. Then we can visualize each partition's extent.

This would be a pretty interesting use case along with some e.g. Overture data, where a user from python could quickly find data in their region of interest.

cc @weiji14 who has been working on remote Parquet file support in https://github.com/geoarrow/geoarrow-rs/pull/493.