SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
MIT License
87 stars 27 forks source link

Model: Gazetteer - user/project specific geospatial shapes #1954

Open mjy opened 3 years ago

mjy commented 3 years ago

As a curator I want to be able to add and name my own shapes outside the Georeference model.

Likely keep this simple, just multi-polygon.

proceps commented 3 years ago

I would also add an option to combine few poligons in a single shape.

debpaul commented 3 years ago

Interesting. How does that look (what is the use case for multi-polygons?

FYI GEOLocate (as part of two upcoming Thematic Collection Networks in the ADBC program), will be adding some functions to GEOLocate, for example, polygons with a "hole" in the middle. (Use case, I want to study terrestrial life whose habitat is fresh-water shorelines. So I need essentially, to be able to make "donut" shapes around lakes. I do not know anything about how Nelson and DJ (developer) will do that, from either the UI or the data format point-of-view.

debpaul commented 3 years ago

@proceps what do you mean by multiple polygons in a single shape?

proceps commented 3 years ago

For example, we have shapes of individual US states, and I would like to combine them into Midwest.

mjy commented 3 years ago

@debpaul "multi_polygon" is a data type in postgres, its allows for lines, single polygons, etc., i.e. we're not necessarily saying we want multiple shapes.

This is a spatial library for the user. They can use their shapes for queries, clone them to georeferences, etc. It's not just a georeference function.

mjy commented 3 years ago

Meh- sorry, "multi_polygon" should be "geometry_collection".

mjy commented 6 months ago

Model

Base (required)

Utility

Relations to GeographicArea

RElations to GeographicItem

We tried to keep all spatial representation in a single table. This means we should add a geography table to GeographicItem (with the downstream plan of merging all the remaining geo columns into it).

Extensions

bpescador commented 6 months ago

I would like the option to add protected areas, ecoregions, and bioregions: why? so that I can filter these areas to generate species lists, COs, etc.

kleintom commented 5 months ago

Some first thoughts - I'm sure I'm missing/confused about some things; corrections or things to think about that I'm leaving out welcome :)

Creation

Updating

Uses

Additions welcome!

Parenting

My first thought is that if you can draw/input a Gaz by hand then anything other than containment would seem difficult to specify for a parent. Maybe for two things that have a level/iso/adm code specified more should/would be required. (Probably not validated on GA because that's all determined at build time by the build scripts?) I don't have a real feel for other possible answers to this yet though (and am not yet very clued into the ways in which current parentage is used), thoughts welcome.

Relations to GeographicArea

First thoughts - I'm not yet clued in to all of the implications/technical details of these decisions:

Relations to GeographicItem

Purely in terms of storing an arbitrary shape in a column of type geometry, this seems to work fine. GeographicItem has many methods, and I haven't gotten through them all yet in accounting for the new separate data type, but that should work out.

In terms of testing, if GeographicItem is eventually going to put all shapes in a single geography column, does it make sense to just duplicate all of the existing GeographicItem specs and adjust them for a geography-column shape?

mjy commented 5 months ago

Which shapes for Gazs should be allowed?

All. Anything that we can spatially compute with.

Hierarchical data maybe requires more custom imports?

Target individual (== flat) records initially. If we can determine semantics between other records (see parent_id musings) then we can add those after the fact. Our primary goal is to 1) add shapes and 2) reference those in place of GAs.

This reminds me that we need to think about how to optionally share across projects. We could project_id == nil assert that it is sharable, however managing updates in that case is difficult downstream. Another approach is to make use of the is_public flag or create is_shared. Finally (and this is where I would lean to start) we could simply clone Gaz entries, while pointing to the identical shape. This would permit discovery and forking of use.

but iirc there aren't currently any GAs that associate to multiple GeoItems

Correct. The multiple shapes relationship is 1:1 representation of the same GA, from different origins.

does existing code check for/handle multiple GeoItems for a GA?

We have default priority, this logic is encoded in GI.

GA tree as Gazs, which could then be edited

Don't sweat editing now. However, the first (and perhaps only) edit I'd like to include is to clip to a bounding box. This use case is to setup for rendering base-maps as png/svg.

we'd want to be able to do all of those updates for Gazs

Maybe. A write once read/replace pattern would also work. I.e. if a user needs to edit the shape then they add a new one or replace the shape. Other attributes that we add should be editable.

issue of parenting will play a role in regards to how GA filtering can be combined with Gaz filtering

Yes, we have multiple combining paths for GA already (see descendants options). These can be parameterized out in the filter. A major theme throughout is to use spatial queries whenever possible, and to guide/direct users to start thinking spatially, not with labels.

would be to have completely separate GA/Gaz filters

Yes.

Autocomplete on GA names

Should only hit GAs in the smart selector. Merged spatial facets are also possible, but these are different code. I want to keep the two isolated at first, with convenience second.

clued into the ways in which current parentage

Parentage needs much cleaner semantics in Gaz, if indeed we use it at all (is it geopolitical, spatial, set defining). Let's not try to sync it with GA for now, because there it means "as used in TW to build the Gaz, roughly geopolitical sets", which could be better defined.

Then you would be able to make a GA a parent of a Gaz and vice versa

No editing GA by users. This will remain an Admin-only defined set of data. Down the road we could envision a user suggesting some GZ be included in the GA, but that's for future consideration. No swapping parenthood for GAs. Asserting a GA is some semantic relation to GZ is possible, but downstream priority re implementation. In general we want to allow "pick a name, use the shape as a spatial search, .pick a set, use that set in a combined spatial search".

does it make sense to just duplicate all of the existing GeographicItem specs and adjust them for a geography-column shape?

Yes, definitely, certainly on a dev branch.

kleintom commented 5 months ago

Are we able to assume postgis >= 3.0 yet? It's not necessary but it would simplify some of the existing and new code around things like ST_Contains(geometry_collection) which isn't supported in earlier versions.

mjy commented 5 months ago

@LocoDelAssembly can confirm what we are using in production, but you should assume for dev evolution 3.3.6 and PG 15. I'd be happy to push that as far foward as we can vs. 1) Ubuntu support and 2) the postgis adapter (https://github.com/rgeo/activerecord-postgis-adapter) and related rgeo library. I'm happy to push all fronts forward as part of this (including feedback to the adapter folk).

LocoDelAssembly commented 5 months ago
Postgres Postgis
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit POSTGIS="3.4.1 ca035b9" [EXTENSION] PGSQL="150" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/share/proj/proj.db" GDAL="GDAL 3.4.1, released 2021/12/27" LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" RASTER
kleintom commented 5 days ago

@bpescador, I'm copying your comment from my Ecoregions gist link to here where the TW team can comment:

I assume that with the new Ecoregion shapefile gazetteer, one could filter COs and CEs on a particular Ecoregion (Biome or "biogeographic realm"). However, would there be a way to create a predicate field "Ecoregion," "Biome," and "biogeographic realms" and have the data filled in based on the gazetter shapefiles so the names could be exported and used in other analyses?