Closed dustymc closed 5 years ago
What is missing from the current model is: "this place name" -> "these coordinates" The assertion is that such and such a locality name (eg river valley ca. 35km sw Mordor) has "these coordinates." There is a huge area of potential ambiguity and skill involved in making such an assertion. The current model has no means to document who did this or when (but it does capture how it was done, eg GPS, Google Maps, etc.).
My take (and see http://arctosdb.org/documentation/places/specimen-event/) is that it's the the agent listed in specimen_event's job to determine if "these coordinates" and "river valley ca. 35km sw Mordor" have anything useful to do with each other in the context of a specimen, and simply not use them if they don't.
From a user's perspective, I don't see much difference between "came up with coordinates" and "hooked specimen to coordinates of indeterminate origin." I'm probably missing some curatorial use...
The old model structure had place name as primary data, with coordinates determined (by a person, etc.) from it.
place_name ---->coordinates (accepted_or_not_flag) ---->coordinates (accepted_or_not_flag) ---->...
which is incompatible with....
coordinates (downloaded from my WAAS-enabled FAA-certified GPS) ---->vague and sometimes wrong place name, filled in by someone for some strange reason
Coordinates and placenames are complementary in the current model - they're in the same table and functionally equivalent, part of the same THING.
GEOREFERENCE_PROTOCOL should provide coordinates-from-description vs. description-from-coordinates directionality.
"GPS download" and "GPS transcription" (the best and debatably-second-best source of coordinates) seem to be buried in GEOREFERENCE_SOURCE (with 7K other values) - and some of mine (which were downloaded) are entered as just "GPS" and so aren't distinguishable from the (normal) "transcribed from the transcription in the field notes" data (which have a large error rate). I have no idea what we're TRYING to do with these fields, but I don't think we're doing it.
GEOREFERENCE_SOURCE is obviously acting as a huge denormalizer (what I've been trying to avoid with the addition of a who/when). The data are mostly variations of a very few things (collector did it, found it on a map, MaNIS, GeoLocate). I completely fail to understand how "2007, Google Earth Maps, Europa technologies, Eye alt=11528 ft" is going to allow me to end up with the same coordinates (and that WAS the point; this field was invented for MaNIS), and if it can't do that I'm not sure what it is useful for.
select count(*) from locality;
713299
UAM@ARCTOS> select count(*) from locality where dec_lat is not null;
575983
UAM@ARCTOS> select count(distinct(dec_lat || dec_long)) from locality;
181937
Our "compromise" model is the "let's denormalize JUST until nobody's happy" model. Let's fix it. I see two ways out:
1) I'll just add whatever anyone wants to Locality, but I get to drop all pretenses of "duplicate." We accept that these data are denormalized by the addition of metadata and stop pretending that locality_id is anything other than a primary key in a table. (Lots of folks try to use locality_id as a "site identifier.") "Locality Nickname" (actual site identifier) remains, so the people who DO want unique-at-some-scale localities can get them through it. (I'd suggest we merge collecting event and locality - what's the point of a "verbatim" table if everything in the next table upstream is verbatim too? - but I think that would break paleo's multi-year named locality data.)
2) We normalize. I have no idea what that means from here. Something about a unique index on coordinates/error/depth/elevation with metadata elsewhere or not existing (I still have no idea what we're trying to DO with whodunit or GEOREFERENCE_SOURCE) or something.
We need to consider usability in terms of the bulkloader and data entry screens before we do anything. Can we drop specimenevent{agent/date} if we have looks-the-same-from-here data in locality, or are those data different (include cultural collns if we have this conversation)? Can we streamline anything else? Does this solve the "verbatim coordinates" issue (by making everything verbatim)? What do users have difficulty with now? (I think just the # of "locality fields" is a major issue.) Etc. Let's don't make this MORE unusable.
If we're going to fix localities, we should also discuss the relationship between placename/coordinates and higher geography. Collected from one "locality," (esp. eg, coastal AK) a fish (seal/etc.) is likely to end up in some sea, a moose in some GMU, a lemming in a quad, a plant in some state park, etc., etc., etc., and that's actively preventing discovery by anyone trying to use higher geography.
See also #739.
Sorry I don't have time to fully digest all you wrote but I don't think you've addressed the issue. Take any record like this: http://arctos.database.museum/guid/UAM:Ento:118788 A user visits and sees that record. They might wonder if the lat/longs were estimated & confirmed by the collector, or later georeferenced by someone else (and if so, whom?). There is some evidence visible - the specimen was collected in 1981 but the coordinates were obtained by use of Google Earth which clearly wasn't present in 1981. Thus we know it wasn't the collector who assigned them, but who did? There is the line " accepted place of collection assigned by Derek S. Sikes on 2010-10-13 " but isn't that rather ambiguous? What did I do when I assigned this specimen to that place of collection? Did I look at a label and transcribe it perfectly (no added coordinates) or did I do some interpretation and add coordinates, or make other changes? And what about if I visit the locality record after someone else has assigned a specimen to a locality, and I see an error (like an Alaskan place mapping in China, and fix it)? I've now changed the coordinates but there is no record that it was me who did so.
My argument remains this: The agent listed in specimen_event.assigned_by is responsible for everything in the locality stack. The model can be interpreted no other way. (But read the last paragraph before you sharpen your pitchfork!)
Collector provided descriptive data? Collector should be specimen_event.assigned_by.
Collector somehow came up with coordinates? Collector should be specimen_event.assigned_by.
Student somehow came up with better coordinates? Student should be specimen_event.assigned_by.
Curator tightened up the error? Curator should be specimen_event.assigned_by.
Yes, lacking something in verificationstatus "assigned by" is ambiguous. You tossed a dart at your map for all I know - and the same is true for most things - which is why #739 proposes....
unverified Definition: No assertion regarding the accuracy of the place and time information is made. Migration Path: No changes.
visit the locality record after someone else has assigned a specimen
If you have access to someone else's collection, they trust you to edit their locality. If you don't have access to their collection, you'll have to split the locality and edit that. (#740 may change that.)
no record that it was me
A specimen can have any number of specimen-events, so just leave the old and add a new if you wish to maintain that history.
The model is pretty rigorous, things like normalization aside - it's hard to find a situation that doesn't work (if you buy into my definitions). BUT, I'm increasingly unsure that it's realistic for anyone to use the thing in a way that actually makes all that work. Doing so would require a lot of specimens having a lot of localities (eg, 4 in the above example), things that can be done with a click or two (update coordinates of unverified localities) should be done with lotsa-clicks (add/edit a new locality), etc. I don't think I can write interfaces to simplify that without introducing some sort of unexpected complications elsewhere. Given those two things, I suggest we back up and re-analyze what sort of locality (in the broadest sense) data we want and what we expect to do with it, then design a model which does that. If that's not possible (and it probably isn't, short-term), I propose we drop some expectations (eg, localities being somewhat-unique) and cram whatever we need to answer whatever questions ya'll want answered into the current model.
I like this "Collector provided descriptive data? Collector should be specimen_event.assigned_by.
Collector somehow came up with coordinates? Collector should be specimen_event.assigned_by.
Student somehow came up with better coordinates? Student should be specimen_event.assigned_by.
Curator tightened up the error? Curator should be specimen_event.assigned_by."
and it's what we try to do mostly... there are problems with usability though.
If the curator edits the locality record why not have Arctos auto-magically change 'specimen_event.assigned_by' to that agent's name & the new date? Doing this manually when editing lots of locality records just doesn't happen.
have Arctos auto-magically change 'specimen_event.assigned_by' to that agent's name & the new date?
Magical probably needs to go through the group (might not be a bad default behavior) but...
can go out with the next Arctos release.
that change sounds great to me.
On Tue, Sep 15, 2015 at 3:30 PM, dustymc notifications@github.com wrote:
have Arctos auto-magically change 'specimen_event.assigned_by' to that agent's name & the new date?
Magical probably needs to go through the group (might not be a bad default behavior) but...
[image: screen shot 2015-09-15 at 4 26 40 pm] https://cloud.githubusercontent.com/assets/5720791/9892751/b68c4276-5bc6-11e5-9591-f032967dd517.png
can go out with the next Arctos release.
— Reply to this email directly or view it on GitHub https://github.com/ArctosDB/arctos/issues/752#issuecomment-140580197.
+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Associate Professor of Entomology University of Alaska Museum 907 Yukon Drive Fairbanks, AK 99775-6960
dssikes@alaska.edu
phone: 907-474-6278 FAX: 907-474-5469
University of Alaska Museum - search 302,939 digitized arthropod records http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++
Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact.php
This is a long thread (still digesting) but I just want to say that I think Derek pointed out the some of our biggest issues with the current model of locality--> disassociating the georeferencer versus the updater (usually a curatorial assistant student or curator vs. the collector. Somewhere along the way to this newer locality model we also made it harder to see the unaccepted coordinates. Tracking history of change is even harder now since it's several clicks away from creating a new specimen event. I still argue that having that versioning of 'georeferences' can be invaluable and nicely mimics legacy curatorial practices of striking out data but not erasing it so future curators can see a history of data updates or fixes.
So in addition to push my name+date to specimen event option which addresses Derek's important point a third option could be to create a new specimen event, deprecating the previous one as 'unaccepted' and saving that history.
On Tue, Sep 15, 2015 at 6:33 PM, DerekSikes notifications@github.com wrote:
that change sounds great to me.
On Tue, Sep 15, 2015 at 3:30 PM, dustymc notifications@github.com wrote:
have Arctos auto-magically change 'specimen_event.assigned_by' to that agent's name & the new date?
Magical probably needs to go through the group (might not be a bad default behavior) but...
[image: screen shot 2015-09-15 at 4 26 40 pm] < https://cloud.githubusercontent.com/assets/5720791/9892751/b68c4276-5bc6-11e5-9591-f032967dd517.png
can go out with the next Arctos release.
— Reply to this email directly or view it on GitHub https://github.com/ArctosDB/arctos/issues/752#issuecomment-140580197.
+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Associate Professor of Entomology University of Alaska Museum 907 Yukon Drive Fairbanks, AK 99775-6960
dssikes@alaska.edu
phone: 907-474-6278 FAX: 907-474-5469
University of Alaska Museum - search 302,939 digitized arthropod records http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++
Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact.php
— Reply to this email directly or view it on GitHub https://github.com/ArctosDB/arctos/issues/752#issuecomment-140594591.
This is a long thread
tl;dr: so let's build a new model.
disassociating the georeferencer
NOBODY (should) CARES! The shape/description has something useful to do with a specimen or not. I don't care if someone used a random coordinate generator (map+dart?) and got lucky, ALL that matters is your assertion that a specimen belongs there.
If you insist on caring, then you can't also care (much) about "duplicates" (and near-duplicates), at least not in this model.
harder to see the unaccepted coordinates
Someone asked for that - figure it out in the group and I can easily turn them back on (in the non-tabular forms - multiple anything-that-doesn't-concatenate remains a problem in tables).
Tracking history of change is even harder now since it's several clicks away from creating a new specimen event.
But it was impossible in the old model! Unless you're talking about JUST coordinates (eg, 2 of the three dimensions of a shape), which is an extremely limited (and I believe severely abused in the old) use case.
I still argue that having that versioning of 'georeferences' can be invaluable and nicely mimics legacy curatorial practices of striking out data but not erasing it so future curators can see a history of data updates or fixes.
Old model, you could add coordinates. New model, you can add coordinates - and also change the county while tracking the old. (And deal with depth/elevation.) The new model saves everything the old can, and a lot more, slightly differently, and associates it with specimens in more-functional way. "Legacy curatorial practices" were developed before GPS was a thing and involved pretending that parasites were parts of hosts, that hosts are just a string in a text field, that cultural collections are not largely made out of things with interesting DNA, and that we'll never encounter an individual twice or send bits to two collections.
So in addition to push my name+date to specimen event option which addresses Derek's important point a third option could be to create a new specimen event, deprecating the previous one as 'unaccepted' and saving that history.
If it's JUST specimen events you're talking about, I can do that. But you're probably not because nothing is important there - you want the old coordinates/continent/etc., right? See #579 - we can (probably) do that but it's far from trivial.
Let's start blank-slate; tell me what data you have, why you care about it, what you want to do with it, etc., and we'll find a model that does that. (I've got a short list of things that I care about too, but I think they're all pretty trivial/obvious.)
Or if that seems overwhelming we can patch who/when in to the current model, but again that does come with a cost in what can be done elsewhere. (And I'm not sure it addresses your concerns??) Maybe the blank-slate approach leads here, maybe not, but it would be good to find out before we end up in some sort of panic situation (what lead to the current model).
magically change 'specimen_event.assigned_by' to that agent's name & the new date
is implemented w/ https://github.com/ArctosDB/arctos/tree/v7.0.5, leaving this open.
From UAM:Ento:
The data entry screen (and bulkloader) treat locality coordinates (locality.dec_lat, locality.dec_long) and verbatim coordinates (collecting_event.verbatim_coordinates) as the same thing to simplify usability. That's possible to change, but would require the addition of "duplicate" coordinate fields and several (around 30) extra columns to the bulkloader, which I suspect would come with significant usability issues. I am completely open to better ideas.
Arctos now provides for any number of "locality stacks" (everything between specimens and higher geography), so one way to deal with this is by entering two localities:
which is of course twice as much work with what seems to me an insignificant benefit - if you mis-typed (or downloaded from your GPS) the "verbatim" you probably did the same for the "spatial."
This was present in the old model and was by design excluded from the new. Under the current model (and any GIS system or map), [X, Y +/- Z, incl. datum etc.] is a defined geospatial area; it's a fact, a data object - I can find it on a map or go there or compare it to other areas. The assertion (via specimen_event.assigned_by_agent, specimen_event.assigned_on_date) is "this specimen<--->{specimen_event_type}<--->that place."
Pull 20,000 bugs out of a trap (at the same time, across centuries, whatever, the PLACE is all the same), enter them wrong, you can fix them all by updating one thing here.
Under the old model and what you're proposing, {[X, Y +/- Z, incl. datum etc.] + who/when} is an assertion, or at least a likely duplicate. Nail a GPS to the ground, we both read the same numbers off of it, we have a two "places." You read it, and then do so again a tenth of a second (Oracle's default date precision) later with the same place-results, we have two places. I guess I'm not opposed to that model, but I am very opposed to that model under our current data structure. If any two specimens are exceedingly unlikely to share a place, then why do we need place as a data object at all? If we have a time component to places, why do we have another time component one join away? Why not move it all closer to specimens (something like Attributes) and simplify the model?
Pull 20,000 bugs out of a trap, enter them wrong, you'll need to update 20,000 things here.
The introduction of "verbatim coordinates" to collecting event is an attempt to have both geospatial-capable locality objects and whatever someone scribbled on some label, verbatim. I think anything preventing those two actions is "only" an interface problem. Adding more metadata to the locality node is a modeling problem, and one which potentially completely changes the nature of the data.