Closed dustymc closed 1 year ago
Can this just be 'part of Arctos' - can we just automatically georeference anything that isn't in some transparent way?
I say yes, with the default being opting in. For collections or objects that you can't / don't want georeferenced, it could be encumbered or turned off. This has, frankly, been one of the biggest headaches for the EH collection coming into Arctos. Our collections are, for the most part, identified by geopolitical / community name not by some GPS point documented by a scientist in the field. I feel like I shouldn't have to look up the lat/long of Nenana every time something is donated from there. I would think that Arctos would know the Higher Geography is the Yukon-Koyukuk census area, and the Indigenous name is Toghotili, and the coordinates are 64.558056, -149.090556 because that's all embedded in the Wikipedia entry. If I have a street address, Google maps can figure out all that info, why shouldn't Arctos?
Automation is great - as long as it is documented ArctosDB/arctos#4866
Google maps can figure out all that info, why shouldn't Arctos?
In part, because Google charges a few thousand dollars per month for that sort of thing (https://github.com/ArctosDB/internal/issues/44). I have a LOT of information - I think everything you brought up - but if we really need Google-level information then we've got to find Google-level resources to support it.
I agree with @AJLinn about making this feature optional.
Would there be something to indicate that the georeferencing was done by a automatically and if/when a human has reviewed it? This ties into Issue ArctosDB/arctos#4866.
My concern is that we have specimens very specific localities ("3 miles southwest of", etc.) and specimens that we choose not to georeference due to lack of information (vague: which Lake George? or too broad: we do not georeference anything larger than county level because the information covers such a broad area). An automatic process is not going to recognize these nuances and reviewing them all is going to take time.
optional
I think that probably means
If that isn't acceptable for some perplexing reason, would anyone use something like a georeference bot that could be turned on to do this by collection?
but I suppose there could be some sort of 'go away automation' locality attribute or something. "Default on" would be good, however that might work - people don't tend to use these kinds of things otherwise.
something to indicate that the georeferencing was done by a automatically
Yes, if someone says GO! today I'll add a locality attribute and go.
recognize these nuances
I think there are two possible situations.
3 miles southwest of
You can see what I'd use for this in edit locality (and reports and such), but that's generally handled reasonably well by the automation.
In answer to Would there be something to indicate that the georeferencing was done by a automatically and if/when a human has reviewed it? This ties into Issue https://github.com/ArctosDB/arctos/issues/4866. This would be via a locality attribute? and would the attribute indicate which version of the georeferencing the date and determiner was associated with? In other words, if I pull up the catalog record, and I see a map and coordinates, I would be able to see who make the current determination and when (e.g. the bot), and potentially see that someone else made an older determination? Or would the older info need to be in a separate specimen event?
Would there be something to indicate that the georeferencing was done by a automatically
Yep, https://github.com/ArctosDB/arctos/issues/4866#issuecomment-1202988399
if/when a human has reviewed it
That's why https://arctos.database.museum/info/ctDocumentation.cfm?table=ctverificationstatus exists.
via a locality attribute
Yes, I think @Nicole-Ridgwell-NMMNHS has this exactly right in the other issue.
georeferencing the date and determiner was associated with? In other words, if I pull up the catalog record, and I see a map and coordinates, I would be able to see who make the current determination and when (e.g. the bot),
Yup.
and potentially see that someone else made an older determination? Or would the older info need to be in a separate specimen event?
I'm proposing to georeference localities which are not georeferenced. That's it. I don't think any bot will ever be allowed to overwrite what a human has done, this one certainly won't. (There's not much reason for humans to overwrite each other here either.) MAYBE some future iteration will make localities or consider how they're used or look at current data or something, but for now I'm proposing a purely additive process (which can be easily 'un-added' because it would only be added to localities which have no coordinates, and accompanied by the [new] 'georeference source' locality attribute attributed to a new bot-agent).
I'm tentatively going next task with this - seems like everyone thinks it's a decent idea, should be easy to un-do if that turns out not to be the case, and it seems like a good (and also easy enough to un-do) way to advance/see in action https://github.com/ArctosDB/arctos/issues/4866.
@gracz-UNL
I have one exception to this mapping rule-- the entire archives collection MVZ:ARCH
Right now I want the default archival record to skip localities because a point cannot describe our records. We dont have features and polygons as a matter of course in Arctos (simply not mature yet) and here you are proposing to add coordinates, which is just not accurate. So I would like to skip this collection in this autoload of coordinates. OK?
a point cannot describe our records.
Good thing there are no points in Arctos!
If you really insist, you can bulkload georeference source
locality attributes - there should be tools to make that pretty simple.
It would be more useful to more people to georeference them with the 'just use geography' option. Data do not need to be precise to be useful. There's no bulkloader for that, but it should be a straightforward SQL update.
Hi, I'm curious. Has this happened? If so, how do we find the bot georeffed localities? I see where to search locality attributes for georeference source, but what words do I search for in that field? There is nothing noted in here: https://handbook.arctosdb.org/documentation/coordinates.html#georeference-source
Lastly, how many MVZ mammals/localities would be effected by this?
Chris
Has this happened?
It's an ongoing process. It is running.
If so, how do we find the bot georeffed localities?
There is nothing noted in here:
Work in progress: https://github.com/ArctosDB/arctos/issues/5120
how many MVZ mammals
Very roughly
arctosprod@arctos>> select count(*) from flat where guid_prefix='MVZ:Mamm' and dec_lat is null;
count
-------
20225
Some of those probably have multiple localities, the bot won't be able to figure out some localities, etc., but that's about the scope.
OK, how do we stop it from georeffing captive colonies? We specifically do not want these georeffed as they are non-natural localities. We have thousands of "Lab colony, UC Berkeley" that should not be georeferenced. Two mammals from Walnut Creek are placed at Pleasant Hill, 3 miles to the north. Both have good localities, we just hadn't gotten around to georeferencing them yet. See maps. We are likely going to have thousands of records like this.
If we only have county and "no specific locality recorded" does the bot do them too? These put a dot in the middle of nowhere, but could be confused with localities where all we get from collectors are GPS coordinates and no written description. I don't think most users are going to pay attention to georeference source. Here is a bear with no locality apart from county, but plotted in a random spot in the center of Sonoma County. https://arctos.database.museum/guid/MVZ:Mamm:4769
Here is another bear from "S Fork Salmon River" with an error radius of only 301 meters. That's a 40 mile river.
Etc., etc.
how do we stop it from georeffing captive colonies
"Don't" is always my preference....
non-natural localitie
There's a way to say that, I can't see any defensible reason to withhold information from someone who might have a use for it.
If you really must block the bot, https://github.com/ArctosDB/arctos/issues/4916#issuecomment-1271666878
If we only have county and "no specific locality recorded"
Not using the recommended term (“No specific locality recorded.”, there's a button on the edit form) for that will often result in the bot doing something crazy. Otherwise, it handles counties fine.
dot
The bot usually includes error.
all we get from collectors are GPS coordinates
Presumably those are georeferenced, the bot will ignore them.
random spot in the center of Sonoma County.
Not sure what you're seeing, the georef covers the county
error radius of only 301 meters
I think that's probably https://github.com/ArctosDB/documentation-wiki/issues/291 - abbreviations confuse the bot (and people). I can dig deeper if you want to share the link.
How is the georeference_bot parsing data and mapping? @cjconroy just pointed out some serious issues - here are some examples:
https://arctos.database.museum/guid/MVZ:Mamm:19290) https://arctos.database.museum/guid/MVZ:Mamm:230812 Locality is San Francisco, and it's being parsed as San Francisco. However, it's mapping to Bolinas Bay with a huge error radius that is far beyond San Francisco.
https://arctos.database.museum/guid/MVZ:Mamm:12640 Locality is San Joaquin Valley. Point is plotting in Bay Area (near Black Diamond Mines Regional Park) with relatively small error. This is no where near the San Juaquin Valley, and that should have a large error.
https://arctos.database.museum/guid/MVZ:Mamm:240875 This one has a street address in Alamo (298 Davey Crockett Court, Alamo) but is plotting to the city of Crockett.
Also copying @mkoo @atrox10
"serious errors" about always come back to https://github.com/ArctosDB/documentation-wiki/issues/291 - the data doesn't follow the documentation, or it does but the documentation needs updated.
The more-recent stuff will tell you what happened.
I could just dump everything where that's missing and let them run again, if that's useful.
Or you can just click geolocate and it'll tell you what its doing.
For whatever reason, 301 meters is geolocatespreche for "not a clue..." - weird, but it makes the probably-needs-help stuff easy to find.
Is there any communication with Nelson? If possible, adding whatever he needs to eg San Joaquin Valley would likely have huge impacts within Arctos (and beyond, but that's icing).
Yes, really don’t like the GeoRef bot. Probably generating lots of bad data that we don’t see until coming upon them randomly, which is far worse than having un-georeferenced records which can be worked through and eventually georeferenced correctly.
Jonathan L. Dunnum Ph.D. (he, him, his) Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
Chair, Systematic Collections Committee, American Society of Mammalogists Latin American Fellowship Committee, ASM
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: Carla @.> Sent: Thursday, December 15, 2022 1:15 PM To: @.> Cc: @.***> Subject: Re: [ArctosDB/arctos] automagic georeferences (Issue ArctosDB/arctos#4916)
[EXTERNAL]
How is the georeference_bot parsing data and mapping? @cjconroyhttps://github.com/cjconroy just pointed out some serious issues - here are some examples:
https://arctos.database.museum/guid/MVZ:Mamm:19290)https://arctos.database.museum/guid/MVZ:Mamm:19290 https://arctos.database.museum/guid/MVZ:Mamm:230812 Locality is San Francisco, and it's being parsed as San Francisco. However, it's mapping to Bolinas Bay with a huge error radius that is far beyond San Francisco.
https://arctos.database.museum/guid/MVZ:Mamm:12640 Locality is San Joaquin Valley. Point is plotting in Bay Area (near Black Diamond Mines Regional Park) with relatively small error. This is no where near the San Juaquin Valley, and that should have a large error.
https://arctos.database.museum/guid/MVZ:Mamm:240875 This one has a street address in Alamo (298 Davey Crockett Court, Alamo) but is plotting to the city of Crockett.
Also copying @mkoohttps://github.com/mkoo @atrox10https://github.com/atrox10
— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/4916#issuecomment-1353656200, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PAYXWTTZLH2BVQ6CKP3WNN37JANCNFSM56A7RQ7Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>
don’t see until coming upon them randomly
Not at all - outliers are detectable.
eventually
Data suggest that's a bit of an exaggeration....
@dustymc I still don''t get it. The parse pattern is 'San Francisco' but it's not mapping to SF ???
GeoLocate also seems to be doing a terrible job of it.
San Francisco shouldn't be that hard.
GeoLocate also
It's all geolocate, so what you see on one will be what you see on the other, unless something changed.
San Francisco shouldn't be that hard.
I'm just using the tools I have (and they're pretty fabulous most of the time). SOMETHING, SOMEWHERE is obviously confusing geolocate. Have you tried it without the bird store bit? Experimentation is how I got what's in https://github.com/ArctosDB/documentation-wiki/issues/291 - not ideal, but some clear recommendations have fallen out and I suppose it's better than the 'I hope something eventually works like I think it should' approach that's lead to the current documentation.....
just "San Francisco" is definitely better than the city with lab or store.
One suggestion: add a link where it has the georeference_bot info that goes directly to that GeoLocate map so an Arctos user can adjust in GeoLocate and (re)save to the application. Is that feasible?
Is there a way to opt out of the bot for certain collections? Or make it so that the bot's georeferencing doesn't appear until someone has confirmed that it is correct? I'm concerned that previous work is being overwritten.
Example: https://arctos.database.museum/guid/CHAS:Herb:1320
This catalog record was reviewed by our georeferencing intern in March. She determined that the specific locality was too broad to be georeferenced and make that note in Locality Remarks. The bot automatically georeferenced the catalog record a few months later. I only realized it because I was looking at records from Illinois and noticed that the map at the top of the search results page couple of coordinates way out in the middle of Lake Michigan. I went in to the record to de-georeference it, but we already had someone evaluate these records prior to the bot and I would like to keep their data.
opt out
https://github.com/ArctosDB/arctos/issues/4916#issuecomment-1271666878
Your screenshot looks correct, but you probably want to delete the bot's georeference source.
I'm concerned that previous work is being overwritten.
It is not.
determined that the specific locality was too broad to be georeferenced
It is not.
@dustymc what about my suggestion of adding a link in the locality attribute table for the georeference_bot where you click and it goes directly to the GeoLocate map so an Arctos user can adjust in GeoLocate and (re)save to the application? That will make it easier to adjust the georeferences. Otherwise you need to take several steps to get there.
I do like the idea of somehow validating the georeference_bot. but I guess that's part of the locality verification status? Will the bot create a locality attribute if the locality is verified and locked?
what about my suggestion
I haven't had time to dig, but at first glance I think that would put a lot of edit links in front of users who can't access them.
somehow validating the georeference_bot
https://github.com/ArctosDB/arctos/issues/5302 is one such service. If something else is accessible in some way I'm happy to look at using it.
Will the bot create a locality attribute if the locality is verified and locked?
I don't think so but not 100% sure - and surely nobody's locking ungeoreferenced events?!? In any case the important thing about the bot is that it does things as a dedicated agent, so whatever it does is easy to find and alter in any way.
could only logged in users see those links to get to GeoLocate? I'm just trying to streamline the process for fixing those 'bad' georeferences.
only logged in users see
I think that might be outright magic!
streamline the process
Gotcha. I can help with a search/reports/sql/whatever - prioritizing https://github.com/ArctosDB/arctos/issues/4995 might help with that, at least for those with asserted geography.
Here is another crazy georef by bot. I found this by mapping all of a certain subspecies of mole that should be generally south of the SF delta. Note the point in the northeast. It should be in with the bulk of the others near San Jose, CA. However, it plots to really the middle of nowhere. The locality is on a UC Reserve. "main road near barn, Blue Oak Ranch Reserve, Santa Clara County, California". However, it plots near the end of "Oak Ranch Road" in Sierra County. At least this one was very obvious. The bot georeffed localities are sometimes pretty close, but also sometimes far off like this.
How do I find all MVZ mammals with georeference protocol = automated georeference? I've turned on all locality options and does not seem to be there.
I can't find georeference protocol on the search for locality page either. I think I must be missing something.
Its under locality attributes. Attribute type = georeference protocol, Attribute value = Automatically created by Arctos services. OR Attribute determiner = georeference bot
Sorry, I still cannot find it. Attached are screen shots of my main search page and locality search pages with drop downs under locality attributes. There is no georeference protocol. Can you post a picture?
Sorry, I got protocol and source mixed up. Georeference source is what you need.
Thanks, found it. I thought it would be something in a drop down menu.
Here is another bad bot georef. This is another one that should have gone to the Blue Oak Ranch Reserve near San Jose, CA, but like the example above, plots to the wrong county. This is LocID 10114917, MVZ:Mamm:218705. It should at minimum gone to the correct county, then ideally to the Blue Oak Ranch Reserve, but there probably is no polygon for that.
I applaud Chris for working through these. Something I have not yet been able to make time for. Disturbs me to think about the poor georeference data we now have in our collection. Returning to my original thoughts on this, no data are far better than bad data. I would prefer all of our automagiced georeferences were automagiced back. We were working our way through the non georeferenced records prior to this.
Jonathan L. Dunnum Ph.D. (he, him, his) Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
Chair, Systematic Collections Committee, American Society of Mammalogists Latin American Fellowship Committee, ASM
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: cjconroy @.> Sent: Thursday, March 2, 2023 10:20 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Comment @.> Subject: Re: [ArctosDB/arctos] automagic georeferences (Issue #4916)
[EXTERNAL]
Here is another bad bot georef. This is another one that should have gone to the Blue Oak Ranch Reserve near San Jose, CA, but like the example above, plots to the wrong county. This is LocID 10114917, MVZ:Mamm:218705. It should at minimum gone to the correct county, then ideally to the Blue Oak Ranch Reserve, but there probably is no polygon for that. [Dark Canyon Creek]https://user-images.githubusercontent.com/5749672/222503678-ef849b9a-ebe5-4d7a-91d6-b0df44fc33b5.png [Blue Oak Ranch]https://user-images.githubusercontent.com/5749672/222503684-ba235249-3a94-4d90-8f2c-9a4088d96a3f.png
— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/4916#issuecomment-1452236175, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA22TI6HD5VULUCUP5LW2DJFBANCNFSM56A7RQ7Q. You are receiving this because you commented.Message ID: @.***>
I'm only throwing out a few examples. I also have not set aside the workforce it will take to correct these. More than 11,000 MVZ mammals were autogeoreffed and need to be reviewed. This is after a report found more than 25,000 MVZ mammals with mismatch in higher geography between georef and stated locality. It is going to take multiple lifetimes to fix these. The autogeoref is only making the pile a lot taller.
Just joining the conversation since I hadn't seen this thread. I'm wary of the fact that georeference_ protocol can easily be left out when people download results due to the results form customization option. If someone selects lat and long coordinate fields, datum and error, it then looks like all associated coordinate data are curatorially asserted since the "automated georeference" remark is only housed in under the georeference_protocol field. They will miss the fact that it was a bot.
There's a lot of concerns voiced here, and the ability to verify coordinates is not feasible for me anytime soon, and I assume for most other folks since it is such a gargantuan task (honestly could be years). I am extremely concerned we are pushing a lot of inaccurate data out there that is extremely difficult for a user to discern that these are not curatorially asserted. I don't think we should be publishing anything or pushing to aggregators anything that is not verified.
I second Emilys comments!
Jonathan L. Dunnum Ph.D. (he, him, his) Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351
Chair, Systematic Collections Committee, American Society of Mammalogists Latin American Fellowship Committee, ASM
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131
From: Emily Braker @.> Sent: Monday, April 10, 2023 11:29 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Comment @.> Subject: Re: [ArctosDB/arctos] automagic georeferences (Issue #4916)
[EXTERNAL]
There's a lot of concerns voiced here, and the ability to verify coordinates is not feasible for me anytime soon, and I assume for most other folks since it is such a gargantuan task (honestly could be years). I am extremely concerned we are pushing a lot of inaccurate data out there that is extremely difficult for a user to discern that these are not curatorially asserted.
— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/4916#issuecomment-1502090834, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA5RZ5SYZPCHDBGADYDXAQ7NZANCNFSM56A7RQ7Q. You are receiving this because you commented.Message ID: @.***>
third
Is there a way to opt out
(And I think it'll ignore named localities too.)
suggested_lat and suggested_long
We've had that for years (since biogeomancer).
is there a reason this is closed? It seems like there isn't a resolution..
Also, and I think I have said this elsewhere, but I can't find it - we now have two different places for designating georeference source. If we only had one, maybe the issue @ebraker brings up would be moot?
If we only had one,
That's one of the reasons I've been trying to prioritize https://github.com/ArctosDB/arctos/issues/5193 for about as long as I can remember....
I understand that there is not currently a way to turn the bot off, but I'm asking if we can make it so (default off, with people opting in). Reading back through this thread there was a suggestion by Dusty:
I suppose there could be some sort of 'go away automation'
AND/OR I am suggesting an overt field title that specifically states "automated_lat" "automated_long" or "bot/suggested/etc" so that it is separate from our existing lat/long fields and makes it entirely clear to an unsuspecting user that we as data providers may have never approved nor set eyes on this information. I strongly feel that this is unauthorized data that is so easily integrated in datasets, especially when the results form is customizable and "automated georeference" falls away. It needs to be hardcoded into field name. I think lat/long should be reserved for verified data (or at least data that a human operator approved to load into Arctos). There could be a workflow - once an operator has reviewed the bot coordinates, they move out of automated_lat/long and into designated coordinate fields.
we now have two different places for designating georeference source. If we only had one, maybe the issue @ebraker brings up would be moot?
It would be an improvement but it still doesn't help the core issue here.
I still advocate for the solutions provided above, but there's 'match type' and 'scores' available from the bot (which aren't defined anywhere) that I assume conveys fitness - perhaps we can implement a very high threshold, eg. only >95% confidence are appended to records.
edited: To be clear, I think automated georeferencing is a cool feature - it is just so imperfect with complex localities, aquatic localities, etc. and we as data providers are considered to be authorities that serve up accurate information. Since no one else is doing this sort of automation, our audiences are very unlikely to even know to interrogate provided coordinate data further.
especially when the results form is customizable and "automated georeference" falls away
Georeference protocol isn't even visible by default.
@mkoo when does the geography committee meet next? I find this issue super distressing and I'm going to be out of town (presenting on Arctos!) next Issues meeting.
See also #6151
My issue with "Bot to automatically georeference localities which have no georeference nor locality attribute of type georeference source" is that we have had individuals go through each locality and manually verify that something is not specific enough, or there's something inconsistent about the verbatim data, and we have evaluated it as unable to be accurately georeferenced, but the bot doesn't recognize that. I've been finding catalog records that have a comment attached to locality remarks that say "cannot georeference due to xyz", but then the bot has made a guess anyway.
Example: https://arctos.database.museum/guid/CHAS:Herb:1880.9.200
OK so what are the options here? I think it is very clear that all collection managers who have weighed in are in agreement that this experiment did not work and the bot has resulted in major georeferencing errors across our collections. Can we revert to prebot values?
@jldunnum you requested that in another issue, I'm working on it.
I've mentioned this in various contexts from time to time, but maybe its never been the topic: I have everything I need to georeference, or check the georeference of, perhaps 95% of all events.
Can this just be 'part of Arctos' - can we just automatically georeference anything that isn't in some transparent way?
If that isn't acceptable for some perplexing reason, would anyone use something like a georeference bot that could be turned on to do this by collection?
If none of that's acceptable - and I hope that isn't the case - then maybe I should do less with the automation; instead of appearing in various searches for probably-cryptic reasons (eg because I have a bunch of complicated behind-the-scenes code running), perhaps records with cruddy or nonexistent georeferences should just be excluded from most results (by virtue of having poor data, not by any extra exclusionary action)?
I have cool data; please help me do something interesting with them!