ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
59 stars 13 forks source link

treat spatial data as spatial data: locality meta issue #2660

Closed dustymc closed 2 years ago

dustymc commented 4 years ago

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe.

We store spatial data as numbers and such (which will get slightly weirder in PostGres). We then use crazy math and javascript libraries and such to attempt to do stuff with those data, triggers to keep them properly typed, etc.

Describe what you're trying to accomplish

Treat spatial data as spatial data.

Describe the solution you'd like

  1. Install postGIG
  2. Convert things that can be represented as spatial data to spatial data
  3. Treat them as spatial data

Describe alternatives you've considered

Struggle on as we have been.

Priority

I think high, but firmly post-migration. Should be done in conjunction with other locality issues so we're not rebuilding forms and then immediately rebuilding the same forms again.

https://github.com/ArctosDB/arctos/issues/2659 https://github.com/ArctosDB/arctos/issues/2658 https://github.com/ArctosDB/arctos/issues/2498 https://github.com/ArctosDB/arctos/issues/2274 https://github.com/ArctosDB/arctos/issues/2271 https://github.com/ArctosDB/arctos/issues/2102 https://github.com/ArctosDB/arctos/issues/2068

PLEASE TAG THIS ISSUE IN ANY LOCALITY STRUCTURE RELATED ISSUES!

Jegelewicz commented 4 years ago

See https://github.com/tdwg/dwc-qa/issues/156

dustymc commented 4 years ago

From https://github.com/ArctosDB/arctos/issues/919

Use case: searching for specimens that intersect a polygon like Yosemite NP using the footprint or WKT polygon even if specimen does not reference Yosemite in the locality description. Can we leverage Google Maps API or BerkeleyMapper perhaps?

mkoo commented 4 years ago

Do we need to leverage GMA or BM? Why not just send polygon data as a bounding box search like we do on the Arctos search page? Arent' there PG libraries for that? Those wkt's can be converted to other formats too.

On Wed, Jul 1, 2020 at 2:44 PM dustymc notifications@github.com wrote:

From #919 https://github.com/ArctosDB/arctos/issues/919

Use case: searching for specimens that intersect a polygon like Yosemite NP using the footprint or WKT polygon even if specimen does not reference Yosemite in the locality description. Can we leverage Google Maps API or BerkeleyMapper perhaps?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2660#issuecomment-652661729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UMOJ5DFP4B54NK3DO3RZOU4BANCNFSM4M6AMJAQ .

dustymc commented 3 years ago

Do we need to leverage GMA or BM?

That might solve some issues, but I don't think it's a complete solution.

just send polygon data

Also only part of the problem.

PG libraries

That's (part of) what I'm looking for.

Nicole-Ridgwell-NMMNHS commented 3 years ago

This is really important for NMMNHS spatial data, which, because we use UTMs, is currently basically not functional in Arctos. Aside from NMMNHS need, as dusty says, there is a ton of neat stuff we could do with this functionality.

campmlc commented 3 years ago

Sounds like a good idea!

On Wed, Jan 27, 2021 at 11:13 AM Nicole-Ridgwell-NMMNHS < notifications@github.com> wrote:

  • [EXTERNAL]*

This is really important for NMMNHS spatial data, which, because we use UTMs, is currently basically not functional in Arctos. Aside from NMMNHS need, as dusty says, there is a ton of neat stuff we could do with this functionality.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2660#issuecomment-768474980, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBCLK7JQWKNCQR47P7DS4BJWBANCNFSM4M6AMJAQ .

Jegelewicz commented 3 years ago

What and who do we need to make this happen?

dustymc commented 3 years ago

TACC says pggis is "not a huge deal." I don't think we're going to do anything too radical to the locality model, but perhaps worth a quick AWG discussion - otherwise I think this just needs prioritized. I'll go next task.

dustymc commented 3 years ago

TACC has elaborated on "not a huge deal" by turning postgis on in test.

Jegelewicz commented 3 years ago

As long as that can translate to production!

dustymc commented 3 years ago

production

No problem, but will require a restart.

dustymc commented 3 years ago

AWG Issues Group: Go, figure out how to UTM-->DD.dd as a start.

mkoo commented 3 years ago

one note about UTM to DD conversion, we might want to ensure we can do southern hemisphere transformations too, which will require a utm latitude band letter as well as the zone.

so for UTM, we need these fields: UTM zone, band letter (if northern hemisphere maybe we can assume this and not require it), northing, easting, datum (but maybe we state it is assumed to be WGS84?)

dustymc commented 3 years ago

It seem like what I actually require is an SRID, but I'm not yet sure that there's not a more-human-readable proxy to that.

32633 is 33N / WGS84, for example.

and I have a definition for it


arctosprod@arctosutf>> select * from spatial_ref_sys where srid=32633;
 srid  | auth_name | auth_srid |                                                                                                                                                                                                                                                                                                        srtext                                                                                                                                                                                                                                                                                                         |                     proj4text                      
-------+-----------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------
 32633 | EPSG      |     32633 | PROJCS["WGS 84 / UTM zone 33N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",15],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32633"]] | +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs 
(1 row)

I haven't been brave enough to try, but I think I can add to my definitions - eg I should be able to convert to/from anything I can find on spatialreference.org/, in theory.

Below is what I'm starting with - I'm not sure how many of these are valid, I think perhaps most of them are something that someone made up for some reason. It might be faster (certainly more accurate) to fire up a utm_zone code table (with SRID) and manually translate them once. That is almost certainly a requirement going forward, "not" or "0.5" should be preemptively rejected and I'm not sure how that could happen without a CT.

I think we'll need to add SRID to ctdatum as well, but that can be Step Two (probably).


arctosprod@arctosutf>> select distinct utm_zone from collecting_event where utm_zone is not null order by utm_zone
arctosutf-> ;
 utm_zone 
----------
 0.5
 06
 1
 10
 10N
 10S
 10T
 11
 11N
 11S
 12
 125
 12N
 12R
 12S
 12s
 13
 135
 13N
 13R
 13S
 13T
 13s
 14
 15
 15T
 15U
 16
 16N
 16T
 16U
 17
 17P
 17T
 17U
 18
 185
 18T
 19
 200
 27
 28
 36K
 36L
 37K
 37L
 48
 48P
 5
 5W
 6
 6N
 6V
 6W
 83
 AG7
 BG8
 BH6
 BH8
 BJ1
 BJ4
 BP1
 BV4
 CB1
 CB2
 CC3
 CG1
 CH0
 CH2
 CH5
 CH6
 CH7
 CJ0
 CJ9
 CK6
 CK9
 CN1
 CN2
 CP1
 CP3
 CP9
 CQ0
 CS4
 CT8
 DA8
 DG1
 DH2
 DH3
 DH7
 DK5
 DK6
 DK7
 DL7
 DM0
 DN4
 DP3
 DQ0
 DQ3
 DR5
 DR6
 DR9
 DS3
 DS5
 DS6
 DS8
 DS9
 DT4
 DT8
 EA0
 EE8
 EH8
 EJ4
 EN1
 EN6
 EP0
 EP2
 EP3
 EP4
 ER3
 ES0
 ET0
 ET4
 FC7
 FH3
 FH4
 FL8
 FN6
 FR2
 FS4
 GC5
 GK6
 GK7
 GL7
 GN8
 GQ4
 GS9
 GU3
 GU9
 KC5
 KC9
 KD3
 KE2
 KE4
 KE5
 KE6
 KF8
 KH0
 KK4
 KK8
 LC5
 LC6
 LC9
 LD6
 LE6
 LF1
 LG5
 LJ2
 LJ3
 LN5
 LP0
 MD1
 MD3
 MD5
 MD8
 ME3
 MF3
 MF7
 MH2
 MH8
 MK7
 MP2
 MP8
 MQ9
 MT1
 MU4
 MUL
 ND4
 ND5
 NE1
 NE2
 NE9
 NF8
 NG6
 NG7
 NG9
 NH1
 NH8
 NJ0
 NK2
 NK7
 NK9
 NL0
 NL4
 NL9
 NM2
 NM7
 NN4
 NP1
 NQ8
 NQ9
 NR9
 NS0
 Orl
 PB4
 PC0
 PE3
 PE4
 PE6
 PF0
 PF2
 PF3
 PF5
 PG7
 PJ0
 PJ2
 PJ7
 PJ9
 PK9
 PL5
 PM0
 PM7
 PN2
 PP9
 PQ0
 PQ2
 PQ6
 PQ9
 PR5
 PR9
 PS5
 PT2
 PT4
 PT6
 PT9
 QE4
 QF7
 QH3
 QH8
 QJ0
 QJ7
 QN3
 QP4
 QQ0
 QQ3
 QQ7
 QS0
 QS3
 QS7
 QT0
 QT1
 QT2
 QU3
 QU6
 QU8
 QU9
 QV6
 QV7
 QV9
 RV0
 RV1
 RV2
 S13
 SE6
 SE7
 SE8
 SE9
 SF6
 SF7
 SG9
 SQ8
 T13
 TB7
 TC9
 TD0
 TD6
 TD9
 TE0
 TE3
 TE4
 TE5
 TE6
 TE7
 TE8
 TF5
 TF9
 TG6
 TJ0
 TN3
 TN4
 TN9
 TP1
 TQ1
 TS9
 TT9
 TU0
 TU1
 TU2
 TU3
 TU4
 TU6
 TU7
 TV1
 TV5
 UA6
 UA7
 UB3
 UB6
 UB7
 UC3
 UF2
 UF4
 UG1
 UH2
 UH3
 UH4
 UK5
 UL5
 UL8
 UM0
 UN1
 UN7
 UP5
 UQ6
 UQ9
 US0
 US2
 US3
 UT0
 UT1
 UT2
 UT5
 UU0
 UU1
 UU2
 UU3
 UU4
 UV8
 VA3
 VA6
 VD9
 VE6
 VF0
 VF2
 VF4
 VF5
 VF9
 VG4
 VJ0
 VJ6
 VJ8
 VL9
 VM5
 VP1
 VR3
 VR4
 VS5
 VU2
 VU3
 VV3
 WA6
 WC2
 WF1
 WF2
 WG6
 WH3
 WR3
 XB3
 XG5
 XG8
 XH0
 XH8
 XJ4
 XK5
 XL3
 XM6
 XN5
 XN7
 XN8
 XR9
 XU6
 YG8
 YH4
 YH6
 YM3
 YM9
 YN7
 YR3
 YT0
 ZD1
 ZM0
 ZM1
 ZM3
 km
 not
(394 rows)
Nicole-Ridgwell-NMMNHS commented 3 years ago

It might be faster (certainly more accurate) to fire up a utm_zone code table (with SRID) and manually translate them once. That is almost certainly a requirement going forward, "not" or "0.5" should be preemptively rejected and I'm not sure how that could happen without a CT.

Agreed, most of that list of zones is nonsense.

band letter (if northern hemisphere maybe we can assume this and not require it)

It would be very nice if we could assume it for northern hemisphere. Is band letter required for southern hemisphere or do we just need to know that it is southern hemisphere?

dustymc commented 3 years ago

assume it for northern hemisphere.

I'm no utmologist, but https://upload.wikimedia.org/wikipedia/commons/b/b7/Universal_Transverse_Mercator_zones.svg suggests the 'letter bands' are universal, and I don't seem to have any "bare" zones without the letter suffix in my definition file.

dustymc commented 3 years ago

Everything I ever thought I knew about UTM seems to be wrong, and I'm not sure I'm done with that yet.....

We definitely need a zone code table with SRID, BUT in order to not confound datum it will be somewhat arbitrary. I'll just assume everything is WGS84 in the code table (and therefore in the initial conversion), convert to coordinate-like strings (I think using geography at this stage would be problematic, not sure yet), then convert from the original datum to WGS84 (after we add an SRID there...).

PG does not need latitude bands (which I think may not be "official" UTM after all??), but does need hemisphere. I think that just means we'll use the same SRID for lots of entities (eg, all northern zone10 - 10N,10P,..,10X - will use https://spatialreference.org/ref/epsg/32610/).

We could potentially drop the redundancy and reduce that to eg "10 north" and "10 south" but the latitude bands do serve as a sort of checksum, and I think may be necessary in some UTM-like systems.

We have a fair bit of "zone S" in the original data, and I'd guess some of it means "Zone S" and some means "South." (The intended meaning of "N" doesn't matter to me, I think.) I think I can continue ignoring that for now, but at some point we're going to have to deal with our existing not-quite-UTM data (perhaps by tossing it into some 'verbatim...' collecting event attribute).

Please let me know ASAP if any of that sounds wildly wrong.

@tucotuco you got any extra fairy dust to sprinkle around? For now, I'm just trying to convert UTM to DD with postgis, but do feel free to stop me doing anything else I'll regret.

@Nicole-Ridgwell-NMMNHS do you have some wild UTM data handy? A small batch of UTM stuff plus state-or-so (just as a sanity check) might be handy in the maybe-not-so-distant future.

Nicole-Ridgwell-NMMNHS commented 3 years ago

We have a fair bit of "zone S" in the original data, and I'd guess some of it means "Zone S" and some means "South."

Yep, this was something that totally confused me as a student, even in my GIS courses, it wasn't really explained. I'm going to guess most of the S's means "Zone S". And probably most of the N's mean North.

do you have some wild UTM data handy?

Sure, here is some private land (not federally protected) data you can work with: Sample UTM Set.zip

dustymc commented 3 years ago

wasn't really explained

Maybe NOBODY really knows! "S for South" certainly seem to be a common thing in pgis lingo.

I would be forced to reject the sample UTM data you sent - I don't think I can do this without something that gets me to the right hemisphere. You can easily fake it by choosing N or greater (these are likely almost all S, maybe some R) - I think all valid northern values will be processed the same way - but then your verbatim won't be very verbatim, which might find a way to bite someone in the future. I assume that will be common - suggestions for how to deal with it? I'm tempted to suggest we include eg "Zone 10, somewhere north of the equator" but I also kinda wanna go wash out my brain with bleach for considering such a thing....

We'll also need to get rid of the N and S suffixes in your value columns, but that should be trivial.

I don't think any of this is stopping me at the moment, just things we need to consider going forward.

Nicole-Ridgwell-NMMNHS commented 3 years ago

I assume that will be common - suggestions for how to deal with it? I'm tempted to suggest we include eg "Zone 10, somewhere north of the equator" but I also kinda wanna go wash out my brain with bleach for considering such a thing....

Our data is easy, just assume northern hemisphere for everything, but that won't necessarily work for other collections' data if we're trying to just take what is currently in UTM verbatim and convert. Definitely moving forward, data entry needs to require Northern hemisphere/Southern hemisphere, written out so its less likely to be confused with band letter.

dustymc commented 3 years ago

Is that a vote for '10N-X' (=north) and '10C-M' (=south) "zones"?

Nicole-Ridgwell-NMMNHS commented 3 years ago

Sure, but if we're going to use that assumption for converting existing values in verbatim, we might want to run a check first so see if anyone has any southern hemisphere data that might break that rule.

dustymc commented 3 years ago

I seem to have a functional UTM2WGS84 script.

Todo ASAP:

Todo next:

Todo less-urgently:

Todo eventually:

dustymc commented 3 years ago

@Nicole-Ridgwell-NMMNHS I ran your data through my function (https://github.com/ArctosDB/PG_DDL/blob/master/function/UTM2WGS84.sql), results attached. Please sanity-check a few, and note the missing values in https://docs.google.com/spreadsheets/d/12XIyXXfuhh6pHI9TyvuuKUdgKZpg8EDeQLn4zu1cf1M/edit#gid=94044881 - adding them in now (before I pull those data to production) is OK, or we can deal with them through the normal code table path after that goes to prod.

I'll pop open a new issue for the zone thing; I think that deserves its own discussion.

Here's the code I used.

drop table temp_utmtest;
create table temp_utmtest as select * from temp_cache.temp_dlm_uptbl ;
select distinct datum from temp_utmtest;
update temp_utmtest set datum='World Geodetic System 1984' where datum='WGS 84';
update temp_utmtest set datum='North American Datum 1983' where datum='NAD 83';
update temp_utmtest set datum='North American Datum 1927' where datum='NAD 27';
 select distinct utmzone from temp_utmtest;
-- not enough information; these are mostly SWUS so fake it
update temp_utmtest set utmzone=utmzone||'S';
select distinct utmn from temp_utmtest;
-- this should be numeric, it ain't
update temp_utmtest set utmn=replace(utmn,'N','');
 select distinct utme from temp_utmtest;
-- ditto
update temp_utmtest set utme=replace(utme,'E','');
  -- place for results
  alter table temp_utmtest add lat double precision;
  alter table temp_utmtest add lng double precision;
  alter table temp_utmtest add convmsg varchar;
  -- make this easy
  alter table temp_utmtest add key int;
  update temp_utmtest set key=nextval('somerandomsequence');
-- now can actually do stuff

CREATE OR REPLACE FUNCTION tempbutm() RETURNS void AS $body$
DECLARE
    r record;
    j json;
begin
  for r in (select * from temp_utmtest) loop
    select UTM2WGS84(
      r.utmzone,
      r.utmn::int,
      r.utme::int,
      r.datum)
    into j;
    if j::jsonb->>'status'='OK' then
      update temp_utmtest set lat=(j::jsonb->>'lat')::double precision,lng=(j::jsonb->>'lng')::double precision where key=r.key;
    else
      update temp_utmtest set convmsg=concat_ws('::',j::jsonb->>'status',j::jsonb->>'message') where key=r.key;
    end if;
  end loop;
END;
$body$
LANGUAGE PLPGSQL
SECURITY DEFINER
volatile;

select tempbutm();

temp_utmtest.csv.zip

Nicole-Ridgwell-NMMNHS commented 3 years ago

I ran your data through my function

Looks good.

adding them in now (before I pull those data to production) is OK, or we can deal with them through the normal code table path after that goes to prod.

Depends I think on what we do in #3415, if we go a route where we don't include band zones (other than north/south) or then I think we could just add them all at the start.

dustymc commented 3 years ago

Putting this on pause until I have postgis at production - TACC is working on that.

Nicole-Ridgwell-NMMNHS commented 3 years ago

Are there any updates on this?

dustymc commented 3 years ago

I'll ping Chris.

Nicole-Ridgwell-NMMNHS commented 3 years ago

It would be helpful to know whether this will still happen in the near future or whether it is a more lengthy process than we thought for TACC to turn on postgis. If the latter then we can work on a stopgap of creating a structured UTM locality attribute. @dustymc @mkoo

see also https://github.com/ArctosDB/arctos/issues/2309#issuecomment-812559005

dustymc commented 3 years ago

I pinged tacc last week, still waiting for them to schedule something. You'll know more when I know more, unless @mkoo has some magic.

dustymc commented 3 years ago

From @mkoo

Actually if we could move away from WKT's I'd be super happy. It's great for storing, say, a SQUARE but it gets challenging with coastlines and anything more complicated than an octagon (probably). I just know it's extra steps to convert from GIS data PLUS it stores no metadata! (that's the killer when comparing to a KML or geoJSON)

WKT is also a pain for me and now we (will soon, I hope...) have spatial tools, so

I think the biggest drawback is backups - this will make them larger. Seems a price well worth paying.

This will probably mean it's not possible to use media IDs in the catalog record bulkloader, but I don't think anyone ever has and it's possible (and introduces much less chance for confusion) to pre-create localities and use them.

Nicole-Ridgwell-NMMNHS commented 2 years ago

What is the status of this? I find myself yet again headed back to our old Access database for a bulk file of our UTMs.

dustymc commented 2 years ago

https://github.com/ArctosDB/internal/issues/90 @mkoo help??

Jegelewicz commented 2 years ago

BTW I took some notes at TDWG this week:

We use Natural Earth for land, Marine Regions for the sea (EEZs), and GADM for administrative areas. We’ve had complains about GADM being out of date, so we’re looking at using GeoBoundaries.org instead. – Matthew Blisset

We have as well. For names,, we agreed on ISO-3166 and UN for our published datasets The high resolution polygons can be difficult to parse over the web (e.g., PostGIS backed API). Drainage networks are especially problematic. – Ben Norton

The lines are fine, it's vertices in a certain order, with all kinds of issues that don't make Euclidian sense (slivers, islands, … ) Validation on large complex polygons is a nightmare – Pieter Huybrechts

Nicole-Ridgwell-NMMNHS commented 2 years ago

Very glad to hear that PostGIS is finally getting added!

dustymc commented 2 years ago

DON'T JINX IT!

mkoo commented 2 years ago

just re reading this thread. After today, the new converter should be functional. Otherwise this is a basic GIS operation which can be done offline in say, QGIS (free, open-source). However it's GIS so a learning curve that may not be everyone's cup of tea.

Hang on until after tonight for Spatial Arctos...

On Mon, Dec 6, 2021 at 2:44 PM Nicole-Ridgwell-NMMNHS < @.***> wrote:

Very glad to hear that PostGIS is finally getting added!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2660#issuecomment-987321636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UNEJ6NPTVODZK5ZP6TUPU4DDANCNFSM4M6AMJAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mkoo commented 2 years ago

fingers crossed for anti-jinxing....

On Mon, Dec 6, 2021 at 2:46 PM dustymc @.***> wrote:

DON'T JINX IT!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2660#issuecomment-987326435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATH7UMATCVMJSTQFSMOQ4TUPU4MTANCNFSM4M6AMJAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

dustymc commented 2 years ago

The remaining action item is to store coordinates as geography. The current data structure makes conversion straightforward (there's only one input format and one datum involved), and doing so is very fast. Trigger checks and decimal-format numbers are still involved, but don't seem to cause problems. Geography (geometry, whatever) is not text, doesn't play nice in text files, and we do a lot of text operations.

It seems that we'll have to convert back and forth for various operations, and I'm not sure where we start much matters. Geography is probably better, but does not seem like a priority at this time. I'm putting this issue on the back burner; it should be revisited when we have more experience using spatial data.

dustymc commented 2 years ago

Closing, can be reopened if we ever find an actual reason to do something else with datatype.