ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Map UTM coordinates #2309

Closed Jegelewicz closed 2 years ago

Jegelewicz commented 4 years ago

See https://www.geoplaner.com/

mkoo commented 4 years ago

Also bulkloading UTMs https://github.com/ArctosDB/PG-migration-testing/issues/29

dustymc commented 4 years ago

Fully handling UTM should be possible under PG, just need to install postgis (and maybe standardize datum vocabulary and such).

Nicole-Ridgwell-NMMNHS commented 4 years ago

What is the status on this? We currently have all our UTMs in locality remarks, which is not great at all since that has been our collection's primary coordinate type. I'm also trying to add our new localities, I've converted the coordinates to lat/long, but I still need to record original UTM. I can put in collecting event, but it does not have a place to record datum.

dustymc commented 4 years ago

I recommend a new locality attribute.

I also have UTM "verbatim" in collecting event, and the lack of a locality.verbatim has caused us to stretch that in indefensible directions, but it's really designed for "came in as this, Arctos converted" rather than "the collector scrawled this on the label."

dustymc commented 4 years ago

You might also consider some structured data type - JSON or XML or whatever you're comfortable with - if you foresee a reason to need these as more than strings.

Nicole-Ridgwell-NMMNHS commented 4 years ago

Yes, we'll definitely need these as more than strings. We need to have fields for zone, northing, easting, and datum. I don't have a preference about what structured data type is used since I know nothing about the differences between those options.

It would be nice to have UTMs automatically convert to lat/long as Orig_Lat_Long_Units does in collecting event. Not sure that model would even work for us though since, because of the nature of our localities, we enter locality first.

Basically, having the UTMs either converted automatically/mappable would save us the task of having to convert roughly 8000 UTMs in some external program and then reloading those back into Arctos. Even if we did do that, we still need someplace to keep the original UTMs with datum.

dustymc commented 3 years ago

I'm drastically upping priority.

https://arctos.database.museum/info/ctDocumentation.cfm?table=ctlat_long_units should contain nothing that can't be converted to DD.dd format, which is the de facto standard at this point. I can't convert UTM. It's been sorta-almost-maybe worth dealing with the complexities that introduces because we've been abusing "verbatim" (=as-entered). We no longer have reason to do that (https://github.com/ArctosDB/arctos/issues/2930), so it's time to wean ourselves of this. There are two options:

I'm not sure https://github.com/ArctosDB/arctos/issues/2660 is realistic with the current dev environment. I have no idea if TACC can support postgis in production (probably??). I don't think anything can be lost by stopping the not-quite-verbatim-mixed-with-other-stuff crazy now, and adding back UTM when/if we can treat it as spatial, so perhaps this is just a proposal to do whatever we need to do in order to get UTM out of ctlat_long_units without losing information.

Nicole-Ridgwell-NMMNHS commented 3 years ago

I am ok with moving UTM from "verbatim" coordinates in collecting event to a locality attribute. However, I need that locality attribute to have a structured data format. Unstructured is useless to me and will just be a mess that we have to reformat later.

UTMs are the primary coordinate format used by our curators and the BLM, so it is imperative for us that this data be in a usable format. If we can eventually get #2660 to work for us, that would be a huge help. Is postgis the only way to get UTM conversion? Google Earth can do it, no problem as long as its in WGS84, so I would think there would be something out there????

Also, I know I could pull this potential new UTM attribute in a query using SQL, but it would be really helpful to be able to pull up this and other locality attributes as separate fields in the locality search.

dustymc commented 3 years ago

@Nicole-Ridgwell-NMMNHS you can just create a new locality attribute and enter eg

[{"utm_zone":"bla"},{"utm_northing":"12345"},{"utm_easting":"98765"}]

Let me know the attribute and your structure and I can set up a validation function.

Nicole-Ridgwell-NMMNHS commented 3 years ago

I'll need someone to show me how to create a new type of locality attribute, unless there is a how-to somewhere that I've missed.

I can create some sort of structure for the data and validation is good, but that still doesn't help with querying.

Edited: Actually can you just help me build SQL query to look for localities within certain ranges of northing and easting coordinates?

dustymc commented 3 years ago

how to create a new type of locality attribute

https://github.com/ArctosDB/arctos/issues/new?assignees=&labels=Function-CodeTables&template=authority-request.md&title=Code+Table+Request+-+

still doesn't help with querying.

I don't have an immediate solution for that. I can get you a place where your data aren't getting mixed with dissimilar data, and I can type them to meet whatever conditions you want to lay out. Beyond that requires more discussion.

build SQL query to look for localities within certain ranges

If you mean locality.dec_lat/dec_long then yes. If you mean with UTM then I think still yes, but it'll probably take me more than a few minutes.

Nicole-Ridgwell-NMMNHS commented 3 years ago

If you mean with UTM then I think still yes

Is there a particular way to format the attribute that will help with that, such as the format example you gave above? Would querying be easier if I create separate attributes for northing, easting, zone, and datum (similar to how we have Township/Range) & rely on the date fields to sort them out if more than one is entered?

dustymc commented 3 years ago

particular way to format the attribute

Something from which I can extract data - the example I gave can be treated as a data object.


arctosprod@arctos>> select jsonb_pretty('[{"utm_zone":"bla"},{"utm_northing":"12345"},{"utm_easting":"98765"}]'::jsonb);
          jsonb_pretty           
---------------------------------
 [                              +
     {                          +
         "utm_zone": "bla"      +
     },                         +
     {                          +
         "utm_northing": "12345"+
     },                         +
     {                          +
         "utm_easting": "98765" +
     }                          +
 ]
(1 row)

Time: 1.120 ms

similar to how we have Township/Range) & rely on the date

Those are probably less stable, but they're also more immediately accessible. I have no objections to that, but you'll need to be careful in how you enter them. If you do so, then we can convert them to something better at any time and what we do now isn't terribly important.

If you have to be paranoid anyway - and you do! - you could also do that as predictable strings, which I could also access easily - I don't care about the format, as long as it's always the same. If those are predictable enough - eg, 5 numbers then a letter then a space then.... - then I might even be able to control them, which would guarantee some level of functionality.

Jegelewicz commented 3 years ago

as long as it's always the same

That will NEVER happen. (OK maybe not NEVER, but it is highly unlikely that it will happen over time). In my bulkload files, UTM are formatted as "utm: zone 13 3834482N 561347E" and that is included in the locality remark. They SHOULD be consistent because I used a concatenation formula to create the bulkloaded remarks, but there are no guarantees).

create separate attributes for northing, easting, zone, and datum (similar to how we have Township/Range) & rely on the date fields to sort them out if more than one is entered?

I think that would be best for now.

dustymc commented 3 years ago

That will NEVER happen.

I can make it happen, if ya'll can define a "datatype" that I can check.

Nicole-Ridgwell-NMMNHS commented 3 years ago

predictable strings

"zone "&##&" "&#######&"mN "&######&"mE "&AAA&" "&##

That last AAA&" "&## would be the datum - that seems the most difficult part to keep standard to me as you could still have someone enter, for example, WSG 83, instead of a correct value. Or what do you enter if datum is unknown?

Part of what I like about the separate attributes is that it would be easy to have code tables for zone and datum. And then maybe we could add a data check to make sure easting is a 6 digit integer and northing a 7 digit integer?

Jegelewicz commented 3 years ago

Part of what I like about the separate attributes is that it would be easy to have code tables for zone and datum. And then maybe we could add a data check to make sure easting is a 6 digit integer and northing a 7 digit integer?

Big 10-4 on that.

dustymc commented 3 years ago

I think "zone "&##&" "&#######&"mN "&######&"mE "&AAA&" "&## is slightly worse than a JSON object, and slightly better than multiple types theoretically united by date, but who knows. As long as this doesn't turn into you asking for something the data can't support, I'm willing to give it a go.

If I can find a datum value in your "data object" (string or JSON or whatevevr) then I can check it against https://arctos.database.museum/info/ctDocumentation.cfm?table=ctdatum. I could do the same with zone, but that's a code table request.

You can type individual attributes with the UI, I can add additional checks if that's not sufficient.

Nicole-Ridgwell-NMMNHS commented 3 years ago

slightly worse than a JSON object

Sorry, I had meant to use that format and forgot. So : [{"utm_zone":"##"},{"utm_northing":"#######"},{"utm_easting":"######"},{"utm_datum":"use code table value here"] with rules: utm_zone: 2 digit integer utm_northing: 7 digit integer utm_easting: 6 digit integer utm_datum: must match a value in datum code table

Visually not the most appealing thing and definitely awkward for data entry, but I like that I wouldn't have separate date, remark, etc. fields for each.

Downside is that if we eventually get a better search display for localities in the UI no one could sort by these or download them in individual columns, it would have to be done with SQL.

dustymc commented 3 years ago

rules:...utm_zone: 2 digit integer...

That should not be much problem to write.

Visually not the most appealing thing

Your browser understands JSON - manipulating those strings into something else (a table might work for this) is (in theory) technically fairly trivial (but still takes time, of which I seem to have negative reserves lately).

eventually get a better search display for localities in the UI

In theory, again, there's lots of magic to be had there too, but practically I'm not so sure - I spent a day or two banging my head on that a while back and didn't get very far. I imagine future versions of PG will have more capabilities, but who knows. Even if that never meets reality, having the data in a typed object should ensure that they're extractable if we ever come up with a better solution.

Should I get started on a check function using the format you gave above, or does this need more discussion?

Jegelewicz commented 3 years ago

I think this needs more discussion.

Why the heck don't we just create the appropriate fields for UTM?

@ccicero @campmlc @mkoo

dustymc commented 3 years ago

Why the heck don't we just create the appropriate fields for UTM?

Elaborate, please.

Jegelewicz commented 3 years ago

We have been beating this dead horse forever - https://github.com/ArctosDB/arctos/issues?q=is%3Aissue+utm

EVERY MSAccess/FileMakePro/Excel "database" has these fields, why are we so resistant to them?

Jegelewicz commented 3 years ago

In Collecting Event we should have the following:

utm_zone utm_northing utm_easting utm_datum

dustymc commented 3 years ago

That would be a big step away from viewing place data as place data and towards viewing them as attributes of catalog records. I'm not sure that's wrong but it's very different. It's certainly not something that can be resolved way down here.

Adding them as collecting event attributes allows that viewpoint without forcing it down anyone else's throat.

campmlc commented 3 years ago

So is there a problem with adding the following as separate event attributes, with controlled vocab as needed? I agree that capturing these data should not be this hard, and it is clearly a community need. If we have to tweak the data model to meet global and community needs for data capture, then so be it. utm_zone utm_northing utm_easting utm_datum

On Wed, Jan 27, 2021, 7:50 AM dustymc notifications@github.com wrote:

  • [EXTERNAL]*

That would be a big step away from viewing place data as place data and towards viewing them as attributes of catalog records. I'm not sure that's wrong but it's very different. It's certainly not something that can be resolved way down here.

Adding them as collecting event attributes allows that viewpoint without forcing it down anyone else's throat.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/2309#issuecomment-768336817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ7JBAUCG2IV2T2Z42OGYDS4AR2HANCNFSM4JBA6OSQ .

dustymc commented 3 years ago

there a problem with adding

No, but there's a problem with retrieving from that.

There are no problems with capturing anything in the current model.

There is a discussion on how things should be captured so they're most usable. That'll probably evolve into UI at some point, but as far as I know the only immediate concerns involve the bulkloader.

Jegelewicz commented 3 years ago

We can concatenate up some record attributes into this:

image

It seems that all we need is something to link those four attributes so that they could be presented similarly? Date and Determiner could be that link?

Nicole-Ridgwell-NMMNHS commented 3 years ago

Currently the only structured way to enter UTM is in collecting event, which is why I had immediate concerns with the bulkloader.

However, actually having that data in collecting event is useless to me. I don't just need to be able to document UTMs, I need to be able to use them as actual spatial data on a locality level. When they're in collecting event, I have to manage one coordinate in a bunch of different events, which just isn't feasible. Collecting event is supposed to contain the "verbatim" coordinate, but I need one "authoritative" coordinate for each locality.

A large portion of our locality coordinates are only in UTM. Mass converting these coordinates would solve that, but I don't have the resources for that yet, plus I'd still have to convert them back frequently because most of the people using our collection work with UTM.

With the current state of things (UTM in collecting event), I have to use our old Access database from 2019 to respond to requests for spatial queries, which is not ok or feasible for much longer.

dustymc commented 3 years ago

Date and Determiner could be that link?

That's not a terribly stable link, whether it's stable enough is a curatorial call. FWIW I don't see it as particularly dangerous, especially with the archive logging changes, but I'd also not want to imply in any way that machines might be able to reliably reassemble those data.

That model also can't guarantee all of the parts exist - there's no realistic way I could prevent you from entering 864 values of "utm_northing" and nothing else.

Given good documentation so users have an understanding of the limitations, I don't object.

Jegelewicz commented 3 years ago

Currently the only structured way to enter UTM is in collecting event, which is why I had immediate concerns with the bulkloader.

However, actually having that data in collecting event is useless to me. I don't just need to be able to document UTMs, I need to be able to use them as actual spatial data on a locality level. When they're in collecting event, I have to manage one coordinate in a bunch of different events, which just isn't feasible. Collecting event is supposed to contain the "verbatim" coordinate, but I need one "authoritative" coordinate for each locality.

A large portion of our locality coordinates are only in UTM. Mass converting these coordinates would solve that, but I don't have the resources for that yet, plus I'd still have to convert them back frequently because most of the people using our collection work with UTM.

With the current state of things (UTM in collecting event), I have to use our old Access database from 2019 to respond to requests for spatial queries, which is not ok or feasible for much longer.

If these were separate COLLECTION event attributes that could be searched in the same way we can search locality attributes:

image

I think that would get closer to what Nicole needs, but I am not sure it will get her everything she needs (searching a range of zones, northings or eastings). It could also mean that she ends up with localities that have events with differing UTM, but I think that just speaks to the need for research on those events.

Nicole-Ridgwell-NMMNHS commented 3 years ago

I am not sure it will get her everything she needs

It gets me pretty much nothing that I need.

It could also mean that she ends up with localities that have events with differing UTM

Right now I have one UTM per locality, which are stored in multiple collecting events. If I end up with differing UTMs because managing one "authoritative" coordinate in 50 different events is difficult, then I no longer have one "authoritative" UTM, I just have a mess.

dustymc commented 3 years ago

that could be searched in the same way we can search locality attributes:

Not much problem to add that, I've been ignoring it because it's one of our many "emergencies" that never actually got used.

searching a range

If you want to do numeric things, you need numeric data. There can't be a "range" operation on 3834482N which can't be anything other than a string. Individual attributes (with units) are again probably the most immediately usable way to get that, although I could (in theory) extract "fields" from a JSON object and cast them to integers.

ends up with localities that have events with differing UTM,

That's just a "feature" of the model - lots of fairly precise event data gets tossed into relatively imprecise localities for various reasons.

Right now I have one UTM per locality

That simplifies reassembly as long as it remains true.

no longer have one "authoritative" UTM, I just have a mess.

So this is a request for a/some LOCALITY attribute(s), not collecting_event, correct?

Jegelewicz commented 3 years ago

managing one "authoritative" coordinate in 50 different events is difficult, then I no longer have one "authoritative" UTM, I just have a mess.

@dustymc Can event attributes be bulk managed? Or SHOULD these be a locality attribute?

There can't be a "range" operation on 3834482N which can't be anything other than a string.

If the data is in a field "utm_northing" then the "N" would not be part of the data.

dustymc commented 3 years ago

Can event attributes be bulk managed?

"Could" anyway - I'm not sure what's available now, we can build tools if there's a real-world use case for them.

Nicole-Ridgwell-NMMNHS commented 3 years ago

So this is a request for a/some LOCALITY attribute(s), not collecting_event, correct?

Yes. I'm ok with creating some sort of collecting event attribute for verbatim UTM, but what I actually need right now is an authoritative locality level UTM.

Nicole-Ridgwell-NMMNHS commented 3 years ago

Individual attributes (with units) are again probably the most immediately usable way to get that, although I could (in theory) extract "fields" from a JSON object and cast them to integers.

If it will be easier to do numeric things with individual attributes, then lets just individual attributes.

dustymc commented 3 years ago

authoritative locality level UTM.

Is this just a conversion problem? Given enough tools and UTM data, would/could/should you

Jegelewicz commented 3 years ago

OK, so how about we create the following LOCALITY attributes with rules: utm_zone: 2 digit integer utm_northing: 7 digit integer utm_easting: 6 digit integer utm_datum: must match a value in datum code table

AND then we would link these up using date and determiner to look like:

UTM          
zone northing easting datum determiner determination date
13 1111111 111111 unknown jegelewicz 2021-01-27

What else needs doing to make this work?

Jegelewicz commented 3 years ago

Given enough tools and UTM data, would/could/should you

I'd still have to convert them back frequently because most of the people using our collection work with UTM.

Nicole-Ridgwell-NMMNHS commented 3 years ago

OK, so how about we create the following LOCALITY attributes

...

AND then we would link these up using date and determiner to look like:

Yes

Given enough tools

Eventually yes, ideally Arctos would auto convert for me, but probably we'll just end up with a GIS geodatabase of our localities, auto-convert everything to lat/long for Arctos & have a process for reconciling the two databases once a month. That GIS functionality is probably at least a couple years away for us right now, and in the meantime, I need to be able to retrieve, in a usable way, the UTMs in Arctos.

dustymc commented 3 years ago

ideally Arctos would auto convert

That's what I was attempting to imply with "enough tools." I was really asking if we can solve this by prioritizing https://github.com/ArctosDB/arctos/issues/2660, which would let us convert to and from anything that postgis understands on demand. (It would also let us accept UTM as "as-entered" - not sure if that's relevant to your needs, but it's in there anyway!)

Nicole-Ridgwell-NMMNHS commented 3 years ago

I was really asking if we can solve this by prioritizing #2660

Yes, that would be the ideal solution to this.

dustymc commented 3 years ago

ideal solution

Then please prioritize that issue! I'll ping TACC to see what we'd need to do to get postgis.

Nicole-Ridgwell-NMMNHS commented 3 years ago

In the meantime, if this is the way we're going, can we add UTM back to the specimen event bulkloader?

dustymc commented 3 years ago

That still seems most-evil to me, but I'm just pushing buttons...

If we do that it'll be a few days to undo what I've done, then we can pull some ambiguous data, then a few more days to un-un-do the changes, and I don't think any of that helps do what you say you need to do.

Suggest:

Nicole-Ridgwell-NMMNHS commented 3 years ago

Well, at the moment we have ~12,000 localities with UTM data in 1) locality remarks (concatenated with other data, no datum) and 2) Collecting event (structured, but repeated across x number of collecting events). Once we get postgis,working that can be converted (pick one collecting event for each and convert that?).

If the ~300 localities I'm trying to enter now are entered in the same way they can be mass converted with everything else. Or, if I go the temporary attribute route then that will be a separate conversion.

dustymc commented 3 years ago

Yes I can help you dig those data out of wherever they end up, and if we've already got to dig part of them out of locality remarks maybe it makes the most sense to put them all there (preferably in some easy-to-extract format).

Nicole-Ridgwell-NMMNHS commented 3 years ago

After we add postgis, we're not getting rid of "verbatim" coordinates in collecting event, including UTMS, correct? As long as we're keeping those fields, I still need to enter data into them.

Reading through #2930 collecting_event.verbatim coordinates = what was entered into Arctos

So I should still use those fields to keep track of how the coordinate was originally supplied to Arctos. Which means I need UTMs added back into the specimen bulkloader.

dustymc commented 3 years ago

Collecting event verbatim coordinates is "what was fed to Arctos, then converted to DD.dd for locality." Except I can't convert UTM, plus it's editable, so it's really a random jumble of random things. With postgis I will be able to convert, so it can be treated like DD MM SS N/S (or anything else convertible).

Without postgis, there was some crazy thing I did to treat UTM as "verbatim." With postgis it'll just be input, like anything else I can convert. I just don't want to add the 'crazy' back in!

Verbatim exists because I don't have tools and I didn't trust the stuff I wrote to convert very much. With real tools, I'm not sure the concept makes sense at all - we should just be able to convert whatever to whatever whenever we want using tools that have been rigorously tested by thousands/millions of users. I don't see a reason to rush to get rid of it, but I'm not sure it'll be around (at least not serving the same purpose) in a year or two either.

Some sort of 'verbatim' (beyond/beside verbatim locality) may make some kind of sense in your workflow, but that's easy under an attribute model.

Make sense?