ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Locality: spatial vs. point-radius #4259

Closed dustymc closed 2 years ago

dustymc commented 2 years ago

Is your feature request related to a problem? Please describe.

The locality model contains both point-radius and "footprint" data. It's not clear which takes precedence, and there's nothing to stop them from conflicting.

Describe what you're trying to accomplish

  1. Clarify what locality data are primary
  2. Do magic spatial stuff

Describe the solution you'd like

This is very, very open for discussion; I'm not sure where these data come from and may be completely lost.

That is, locality "coordinate" data could be given as either coordinates+error or as a polygon, but not both (as is possible - maybe required - now). There would be no possibility for ambiguity. Additional localities can be created and attached to relevant objects if both kinds of data are available.

This will result in all localities with "coordinates" having both coordinates and spatial data, which will allow queries involving

Describe alternatives you've considered

I don't think there are viable alternatives to sorting this out; we finally have tools, this can't be ignored.

Functionality is completely open - this could result in very slightly better documentation, deep spatial abilities in Arctos, or anything in between.

It might be possible to calculate on demand rather than storing, but I suspect we don't have the CPU to fully support that.

Some sort of cache might be possible, but I think both kinds of spatial data are "core" so that seems like unnecessary complexity.

Priority

High - I want to implement all of the cool spatial stuff that's been on the back burner forever!

dustymc commented 2 years ago

How can we prioritize/resolve this?

dustymc commented 2 years ago

AWG: seems reasonable, go.

Name: primary_spatial_data IN (point-radius, polygon)

dustymc commented 2 years ago

I am setting this up so that a point-radius value is generated when a polygon is saved, and a polygon is generated when a point-radius is saved. Circle-polygon "resolution" is defined by specifying a number of segments per quarter-circle. Large values are more circular but take more disk space. Small values are nicer to store, back up, export to DWC, download as text, etc., etc., but also less circular/accurate. I'm not sure how to balance that.

A 4-segment circle looks like this:

Screen Shot 2022-03-10 at 10 51 21 AM

An 8-segment (postgis' default) takes about twice the disk space:

Screen Shot 2022-03-10 at 10 52 30 AM

VERY roughly, 4-segment circles would use about 25MB of disk and 8-segment would require around 50MB with current data. The current table (without calculated polygons) is about 400MB.

The shapes are the same with any size error radius, but more noticeable with large errors where a user is likely to zoom past the error boundary.

If nobody has opinions before this goes to production (sometime next week, maybe) I'll go with the default of 8 segments.

Help!

@mkoo

sharpphyl commented 2 years ago

@Dusty Want to make sure I understand all this as creating polygons is something we've wanted to do to deal with marine offshore localities. We've only created a few and don't fully understand how it works.

I created a polygon in DMNS:Inv:26912 to describe locality 10897677 which was in your list of temp_lotsa_err as having an exorbitant error radius. It was attempt to describe the Peru-Chile trench https://en.wikipedia.org/wiki/Peru–Chile_Trench which is 5,900 km long.

The first attempt used a point/error radius approach resulting in an error radius the size of South America. So I drew a polygon (with more than 8 segments) which still shows the point lat/long in the record implying we know where in the trench the specimen was found. Is this what you're referring to in this thread?

Screen Shot 2022-03-12 at 6 44 48 AM

I am setting this up so that a point-radius value is generated when a polygon is saved, and a polygon is generated when a point-radius is saved.

Does that mean that you'll create a circle around this polygon to replace the many segments?

Is it possible to get rid of the point coordinates in the record and just have the spacial description? And is this moving toward a solution for offshore marine localities (which generate hundreds of annotations)? And does the polygon I created violate your limit on segments?

Or do we stop geolocating records like these? Or should it be a feature and can it have a (more precise) polygon as a feature? Should I convert these areas to WKTs (which I don't know how to do but Wikipedia says "For example, PostGIS contains functions that can convert geometries to and from a WKT representation, making them human readable.") Is it something we can do?

Showing the original attempt as "As entered coordinates" implies that we had coordinates to enter into the record which is not true. How can I get rid of the history in the public record? (New issue?)

I know I'm mucking up your focused issue here, but how to handle marine locations has been a problem for years. Does this issue/feature offer a solution we should be using?

dustymc commented 2 years ago

don't fully understand how it works

This should fix that! Basically just open geolocate, draw a polygon, save, save - you can try it out at test.

exorbitant error radius.

Thanks, I'll add that to the other, I don't have any problem with any kind of actual data, I'm just trying to prevent nonsense.

with more than 8 segments

That's data, feel free to use all the segments you need. The ones I'm drawing are metadata approximations of point-radius data which allow me to access tools; they can be "rougher" because they can be re-created from the point-radius (actual data) at any time.

you'll create a circle around this polygon

Yes, but I'm not replacing anything, just creating a point-radius approximation for tools that can't use the spatial data. The core of this issue is keeping track of which one is primary, which lets me freely mess with the secondary.

replace the many segments?

Again, those are your data and you can use all you need.

rid of the point coordinates in the record and just have the spacial description?

As data - sure, I'll add them back, but you don't have to pay any attention to them, they're just for the primitives without cool tools.

moving toward a solution for offshore marine localities

I certainly hope so.

generate hundreds of annotations

See https://github.com/ArctosDB/arctos/issues/3530, I have tools, this is making the data so I can use them, not in next release.

does the polygon I created violate your limit on segments?

I just want to be very clear on this: The limit is for me, not you. If you need a terabyte to describe some place I'll try to accommodate. (Part of how I do so is by not using any data I don't need to for myself, hence the segments question.)

stop geolocating records like these?

Not at all, you just have more shapes than one (circle) more available; you don't need to suggest alpine clams to include some complex coast anymore. (I will, but my huge circle will be accompanied by primary_spatial_data signifying to anyone with spatial tools to ignore my version; they don't it.)

Should I convert these areas to WKTs

I can convert, just open an issue if some tool is missing.

to and from a WKT representation, making them human readable

I suppose that's technically true, but I'm skipping a step and converting to shapes on a map.

Showing the original attempt as "As entered coordinates" implies that we had coordinates to enter into the record which is not true. How can I get rid of the history in the public record? (New issue?)

Yea, new issue - that's event, not locality, and beyond this.

Does this issue/feature offer a solution we should be using?

I don't think it changes anything for you. You can still assert some geography (which we generally don't seem to have), or my offer to do that for you could be resurrected. How I choose appropriate geography just got more complicated - I can't justify ignoring everything but the point now - but that's my problem.

sharpphyl commented 2 years ago

This should fix that! Basically just open geolocate, draw a polygon, save, save - you can try it out at test.

Woohoo! Much better. Do you have a date for the next release?

It looks like GBIF copies the polygon but gives the error Footprint WKT invalid. Should we ignore that?

Screen Shot 2022-03-12 at 9 52 38 AM
dustymc commented 2 years ago

It keeps asking for coordinates.

Are you sure that's test? You can select primary_spatial_data, or geolocate will do it for you. There are three possible states:

No coordinate data:

Screen Shot 2022-03-12 at 8 56 31 AM

Point-radius:

Screen Shot 2022-03-12 at 8 56 41 AM

or polygon:

Screen Shot 2022-03-12 at 8 56 50 AM

Do I need both the polygon and coordinates?

You cannot - you do one, I'll fill in the other when you save.

date for the next release?

Mid/early next week, probably/hopefully.

GBIF..Should we ignore that?

For now - yes. Turns out WKT is super flaky, lots of legacy data were problematic to convert, or converted but can't be fully used because they have some wonky "loops around itself" feature or etc. This should (slowly, probably) fix that - as always, exposing data to new tools makes it better.

sharpphyl commented 2 years ago

Are you sure that's test? You can select primary_spatial_data, or geolocate will do it for you

Yes, I wasn't in test and tried to delete my comment but you beat me to it. It works great in test

Additional explanation very helpful.

sharpphyl commented 2 years ago

So here's what I get in test when I changed a point-radius to a polygon for http://test.arctos.database.museum/guid/DMNS:Inv:16436. (You do have to start from scratch rather than modifying the coordinates.)

Screen Shot 2022-03-12 at 1 41 15 PM

It looks like geolocate assigns the midpoint as the stated coordinates and the end points as the error radius. It no longer says that they are verbatim coordinates which is helpful for us. Just want to be certain I'm using this correctly as we will probably make a lot of them polygons.

dustymc commented 2 years ago

You do have to start from scratch rather than modifying the coordinates.

Yes, I hope I'll figure out modifying polygons at some point, but I don't have that yet.

geolocate assigns

Nope, geolocate is just building the polygon. When it saves a trigger builds whatever's missing (and converts the weird thing from geolocate into geography datatype), and there are two important considerations there:

  1. The stuff I add is not "data," think of it like a finding aid or approximation for tools that can't use the original data - that's the distinction the new concept makes, and
  2. Since it's generated, it can be regenerated at any time. I'm currently grabbing the centroid of the minimumboundingcircle around the polygon, then grabbing the minimum radius of that to build my circle. There's a bunch of flipflopping back and forth across geometry and geography to get the units I need and etc., and maybe a real GIS-ologist would have my head for it, but it results in something that fits in the point-radius model and looks about right on a map. If someone comes up with a better idea I'll just steal it and regenerate everything, no big deal.

no longer says that they are verbatim coordinates

Right, there's no need because I'm not doing anything fancy with data - I'm just converting to the right datatype (and making up some other disposable-ish stuff because it's useful to do so).

certain I'm using this correctly

Looks reasonable to me. Even if what you drew isn't a perfect representation of what you were given, it seems a heck of a lot more precise than the giant circle we've been mostly limited to - you're no longer asserting that the thing might be from a half-mile inland nor a half-mile offshore.

Note also that there's a new map border color - this one is orange because there's no geography spatial data. You might want to file an Issue (or not, I can find those from the geography data) - I don't know what to do about it right now, but hopefully at some point we'll find a way to get those data.

Screen Shot 2022-03-12 at 1 48 15 PM
sharpphyl commented 2 years ago

Big, big step forward for us. Yes, much better than the big circles

this one is orange because there's no geography spatial data.

By "geography spatial data" do you mean a WKT or similar "official" area? Just trying to learn the lingo.

dustymc commented 2 years ago

By "geography spatial data" do you mean

Just a spatial/polygon/mappable representation of the geography record. (WKT is a format of spatial data. Its no longer what we store, but we can convert to and from it.)

"official"

Just useful.