ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

automagic georeferences #4916

Closed dustymc closed 1 year ago

dustymc commented 2 years ago

I've mentioned this in various contexts from time to time, but maybe its never been the topic: I have everything I need to georeference, or check the georeference of, perhaps 95% of all events.

Can this just be 'part of Arctos' - can we just automatically georeference anything that isn't in some transparent way?

If that isn't acceptable for some perplexing reason, would anyone use something like a georeference bot that could be turned on to do this by collection?

If none of that's acceptable - and I hope that isn't the case - then maybe I should do less with the automation; instead of appearing in various searches for probably-cryptic reasons (eg because I have a bunch of complicated behind-the-scenes code running), perhaps records with cruddy or nonexistent georeferences should just be excluded from most results (by virtue of having poor data, not by any extra exclusionary action)?

I have cool data; please help me do something interesting with them!

AJLinn commented 2 years ago

Can this just be 'part of Arctos' - can we just automatically georeference anything that isn't in some transparent way?

I say yes, with the default being opting in. For collections or objects that you can't / don't want georeferenced, it could be encumbered or turned off. This has, frankly, been one of the biggest headaches for the EH collection coming into Arctos. Our collections are, for the most part, identified by geopolitical / community name not by some GPS point documented by a scientist in the field. I feel like I shouldn't have to look up the lat/long of Nenana every time something is donated from there. I would think that Arctos would know the Higher Geography is the Yukon-Koyukuk census area, and the Indigenous name is Toghotili, and the coordinates are 64.558056, -149.090556 because that's all embedded in the Wikipedia entry. If I have a street address, Google maps can figure out all that info, why shouldn't Arctos?

Jegelewicz commented 2 years ago

Automation is great - as long as it is documented ArctosDB/arctos#4866

dustymc commented 2 years ago

Google maps can figure out all that info, why shouldn't Arctos?

In part, because Google charges a few thousand dollars per month for that sort of thing (https://github.com/ArctosDB/internal/issues/44). I have a LOT of information - I think everything you brought up - but if we really need Google-level information then we've got to find Google-level resources to support it.

wellerjes commented 2 years ago

I agree with @AJLinn about making this feature optional.

Would there be something to indicate that the georeferencing was done by a automatically and if/when a human has reviewed it? This ties into Issue ArctosDB/arctos#4866.

My concern is that we have specimens very specific localities ("3 miles southwest of", etc.) and specimens that we choose not to georeference due to lack of information (vague: which Lake George? or too broad: we do not georeference anything larger than county level because the information covers such a broad area). An automatic process is not going to recognize these nuances and reviewing them all is going to take time.

dustymc commented 2 years ago

optional

I think that probably means

If that isn't acceptable for some perplexing reason, would anyone use something like a georeference bot that could be turned on to do this by collection?

but I suppose there could be some sort of 'go away automation' locality attribute or something. "Default on" would be good, however that might work - people don't tend to use these kinds of things otherwise.

something to indicate that the georeferencing was done by a automatically

Yes, if someone says GO! today I'll add a locality attribute and go.

recognize these nuances

I think there are two possible situations.

  1. There's no georeference, most users never find the thing, nothing really ever gets better.
  2. There is a georeference so the thing becomes accessible - people might find and use it (which is some indication that it's not obviously wrong, maybe), or it might end up in 'this doesn't add up' reports and such - the spatial data adds value even if it is wildly wrong for some reason. (And (3), sorta, probably: I add behind-the-scenes magic and some users probably get what they need despite bad or nonexistent asserted data.)

3 miles southwest of

You can see what I'd use for this in edit locality (and reports and such), but that's generally handled reasonably well by the automation.

campmlc commented 1 year ago

In answer to Would there be something to indicate that the georeferencing was done by a automatically and if/when a human has reviewed it? This ties into Issue https://github.com/ArctosDB/arctos/issues/4866. This would be via a locality attribute? and would the attribute indicate which version of the georeferencing the date and determiner was associated with? In other words, if I pull up the catalog record, and I see a map and coordinates, I would be able to see who make the current determination and when (e.g. the bot), and potentially see that someone else made an older determination? Or would the older info need to be in a separate specimen event?

dustymc commented 1 year ago

Would there be something to indicate that the georeferencing was done by a automatically

Yep, https://github.com/ArctosDB/arctos/issues/4866#issuecomment-1202988399

if/when a human has reviewed it

That's why https://arctos.database.museum/info/ctDocumentation.cfm?table=ctverificationstatus exists.

via a locality attribute

Yes, I think @Nicole-Ridgwell-NMMNHS has this exactly right in the other issue.

georeferencing the date and determiner was associated with? In other words, if I pull up the catalog record, and I see a map and coordinates, I would be able to see who make the current determination and when (e.g. the bot),

Yup.

and potentially see that someone else made an older determination? Or would the older info need to be in a separate specimen event?

I'm proposing to georeference localities which are not georeferenced. That's it. I don't think any bot will ever be allowed to overwrite what a human has done, this one certainly won't. (There's not much reason for humans to overwrite each other here either.) MAYBE some future iteration will make localities or consider how they're used or look at current data or something, but for now I'm proposing a purely additive process (which can be easily 'un-added' because it would only be added to localities which have no coordinates, and accompanied by the [new] 'georeference source' locality attribute attributed to a new bot-agent).

I'm tentatively going next task with this - seems like everyone thinks it's a decent idea, should be easy to un-do if that turns out not to be the case, and it seems like a good (and also easy enough to un-do) way to advance/see in action https://github.com/ArctosDB/arctos/issues/4866.

@gracz-UNL

mkoo commented 1 year ago

I have one exception to this mapping rule-- the entire archives collection MVZ:ARCH

Right now I want the default archival record to skip localities because a point cannot describe our records. We dont have features and polygons as a matter of course in Arctos (simply not mature yet) and here you are proposing to add coordinates, which is just not accurate. So I would like to skip this collection in this autoload of coordinates. OK?

dustymc commented 1 year ago

a point cannot describe our records.

Good thing there are no points in Arctos!

If you really insist, you can bulkload georeference source locality attributes - there should be tools to make that pretty simple.

It would be more useful to more people to georeference them with the 'just use geography' option. Data do not need to be precise to be useful. There's no bulkloader for that, but it should be a straightforward SQL update.

cjconroy commented 1 year ago

Hi, I'm curious. Has this happened? If so, how do we find the bot georeffed localities? I see where to search locality attributes for georeference source, but what words do I search for in that field? There is nothing noted in here: https://handbook.arctosdb.org/documentation/coordinates.html#georeference-source

Lastly, how many MVZ mammals/localities would be effected by this?

Chris

dustymc commented 1 year ago

Has this happened?

It's an ongoing process. It is running.

If so, how do we find the bot georeffed localities?

Screen Shot 2022-10-13 at 1 45 39 PM

There is nothing noted in here:

Work in progress: https://github.com/ArctosDB/arctos/issues/5120

how many MVZ mammals

Very roughly

arctosprod@arctos>> select count(*) from flat where guid_prefix='MVZ:Mamm' and dec_lat is null;
 count 
-------
 20225

Some of those probably have multiple localities, the bot won't be able to figure out some localities, etc., but that's about the scope.

cjconroy commented 1 year ago

OK, how do we stop it from georeffing captive colonies? We specifically do not want these georeffed as they are non-natural localities. We have thousands of "Lab colony, UC Berkeley" that should not be georeferenced. Two mammals from Walnut Creek are placed at Pleasant Hill, 3 miles to the north. Both have good localities, we just hadn't gotten around to georeferencing them yet. See maps. We are likely going to have thousands of records like this.

If we only have county and "no specific locality recorded" does the bot do them too? These put a dot in the middle of nowhere, but could be confused with localities where all we get from collectors are GPS coordinates and no written description. I don't think most users are going to pay attention to georeference source. Here is a bear with no locality apart from county, but plotted in a random spot in the center of Sonoma County. https://arctos.database.museum/guid/MVZ:Mamm:4769

Here is another bear from "S Fork Salmon River" with an error radius of only 301 meters. That's a 40 mile river.

Etc., etc.

Bot location for Walnut Creek locs Street address lookup in google
dustymc commented 1 year ago

how do we stop it from georeffing captive colonies

"Don't" is always my preference....

non-natural localitie

There's a way to say that, I can't see any defensible reason to withhold information from someone who might have a use for it.

If you really must block the bot, https://github.com/ArctosDB/arctos/issues/4916#issuecomment-1271666878

If we only have county and "no specific locality recorded"

Not using the recommended term (“No specific locality recorded.”, there's a button on the edit form) for that will often result in the bot doing something crazy. Otherwise, it handles counties fine.

dot

The bot usually includes error.

all we get from collectors are GPS coordinates

Presumably those are georeferenced, the bot will ignore them.

random spot in the center of Sonoma County.

Not sure what you're seeing, the georef covers the county

Screen Shot 2022-10-13 at 2 33 02 PM

error radius of only 301 meters

I think that's probably https://github.com/ArctosDB/documentation-wiki/issues/291 - abbreviations confuse the bot (and people). I can dig deeper if you want to share the link.

ccicero commented 1 year ago

How is the georeference_bot parsing data and mapping? @cjconroy just pointed out some serious issues - here are some examples:

https://arctos.database.museum/guid/MVZ:Mamm:19290) https://arctos.database.museum/guid/MVZ:Mamm:230812 Locality is San Francisco, and it's being parsed as San Francisco. However, it's mapping to Bolinas Bay with a huge error radius that is far beyond San Francisco.

https://arctos.database.museum/guid/MVZ:Mamm:12640 Locality is San Joaquin Valley. Point is plotting in Bay Area (near Black Diamond Mines Regional Park) with relatively small error. This is no where near the San Juaquin Valley, and that should have a large error.

https://arctos.database.museum/guid/MVZ:Mamm:240875 This one has a street address in Alamo (298 Davey Crockett Court, Alamo) but is plotting to the city of Crockett.

Also copying @mkoo @atrox10

dustymc commented 1 year ago

"serious errors" about always come back to https://github.com/ArctosDB/documentation-wiki/issues/291 - the data doesn't follow the documentation, or it does but the documentation needs updated.

The more-recent stuff will tell you what happened.

Screenshot 2022-12-15 at 12 32 12 PM

I could just dump everything where that's missing and let them run again, if that's useful.

Or you can just click geolocate and it'll tell you what its doing.

Screenshot 2022-12-15 at 12 34 59 PM

For whatever reason, 301 meters is geolocatespreche for "not a clue..." - weird, but it makes the probably-needs-help stuff easy to find.

Is there any communication with Nelson? If possible, adding whatever he needs to eg San Joaquin Valley would likely have huge impacts within Arctos (and beyond, but that's icing).

jldunnum commented 1 year ago

Yes, really don’t like the GeoRef bot. Probably generating lots of bad data that we don’t see until coming upon them randomly, which is far worse than having un-georeferenced records which can be worked through and eventually georeferenced correctly.


Jonathan L. Dunnum Ph.D. (he, him, his) Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351

Chair, Systematic Collections Committee, American Society of Mammalogists Latin American Fellowship Committee, ASM

MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals

Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131

From: Carla @.> Sent: Thursday, December 15, 2022 1:15 PM To: @.> Cc: @.***> Subject: Re: [ArctosDB/arctos] automagic georeferences (Issue ArctosDB/arctos#4916)

[EXTERNAL]

How is the georeference_bot parsing data and mapping? @cjconroyhttps://github.com/cjconroy just pointed out some serious issues - here are some examples:

https://arctos.database.museum/guid/MVZ:Mamm:19290)https://arctos.database.museum/guid/MVZ:Mamm:19290 https://arctos.database.museum/guid/MVZ:Mamm:230812 Locality is San Francisco, and it's being parsed as San Francisco. However, it's mapping to Bolinas Bay with a huge error radius that is far beyond San Francisco.

https://arctos.database.museum/guid/MVZ:Mamm:12640 Locality is San Joaquin Valley. Point is plotting in Bay Area (near Black Diamond Mines Regional Park) with relatively small error. This is no where near the San Juaquin Valley, and that should have a large error.

https://arctos.database.museum/guid/MVZ:Mamm:240875 This one has a street address in Alamo (298 Davey Crockett Court, Alamo) but is plotting to the city of Crockett.

Also copying @mkoohttps://github.com/mkoo @atrox10https://github.com/atrox10

— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/4916#issuecomment-1353656200, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PAYXWTTZLH2BVQ6CKP3WNN37JANCNFSM56A7RQ7Q. You are receiving this because you are subscribed to this thread.Message ID: @.***>

dustymc commented 1 year ago

don’t see until coming upon them randomly

Not at all - outliers are detectable.

eventually

Data suggest that's a bit of an exaggeration....

ccicero commented 1 year ago

@dustymc I still don''t get it. The parse pattern is 'San Francisco' but it's not mapping to SF ???

GeoLocate also seems to be doing a terrible job of it. image

San Francisco shouldn't be that hard.

dustymc commented 1 year ago

GeoLocate also

It's all geolocate, so what you see on one will be what you see on the other, unless something changed.

San Francisco shouldn't be that hard.

I'm just using the tools I have (and they're pretty fabulous most of the time). SOMETHING, SOMEWHERE is obviously confusing geolocate. Have you tried it without the bird store bit? Experimentation is how I got what's in https://github.com/ArctosDB/documentation-wiki/issues/291 - not ideal, but some clear recommendations have fallen out and I suppose it's better than the 'I hope something eventually works like I think it should' approach that's lead to the current documentation.....

ccicero commented 1 year ago

just "San Francisco" is definitely better than the city with lab or store.

One suggestion: add a link where it has the georeference_bot info that goes directly to that GeoLocate map so an Arctos user can adjust in GeoLocate and (re)save to the application. Is that feasible?

image

wellerjes commented 1 year ago

Is there a way to opt out of the bot for certain collections? Or make it so that the bot's georeferencing doesn't appear until someone has confirmed that it is correct? I'm concerned that previous work is being overwritten.

Example: https://arctos.database.museum/guid/CHAS:Herb:1320

This catalog record was reviewed by our georeferencing intern in March. She determined that the specific locality was too broad to be georeferenced and make that note in Locality Remarks. The bot automatically georeferenced the catalog record a few months later. I only realized it because I was looking at records from Illinois and noticed that the map at the top of the search results page couple of coordinates way out in the middle of Lake Michigan. I went in to the record to de-georeference it, but we already had someone evaluate these records prior to the bot and I would like to keep their data. image

dustymc commented 1 year ago

opt out

https://github.com/ArctosDB/arctos/issues/4916#issuecomment-1271666878

Your screenshot looks correct, but you probably want to delete the bot's georeference source.

I'm concerned that previous work is being overwritten.

It is not.

determined that the specific locality was too broad to be georeferenced

It is not.

ccicero commented 1 year ago

@dustymc what about my suggestion of adding a link in the locality attribute table for the georeference_bot where you click and it goes directly to the GeoLocate map so an Arctos user can adjust in GeoLocate and (re)save to the application? That will make it easier to adjust the georeferences. Otherwise you need to take several steps to get there.

I do like the idea of somehow validating the georeference_bot. but I guess that's part of the locality verification status? Will the bot create a locality attribute if the locality is verified and locked?

dustymc commented 1 year ago

what about my suggestion

I haven't had time to dig, but at first glance I think that would put a lot of edit links in front of users who can't access them.

somehow validating the georeference_bot

https://github.com/ArctosDB/arctos/issues/5302 is one such service. If something else is accessible in some way I'm happy to look at using it.

Will the bot create a locality attribute if the locality is verified and locked?

I don't think so but not 100% sure - and surely nobody's locking ungeoreferenced events?!? In any case the important thing about the bot is that it does things as a dedicated agent, so whatever it does is easy to find and alter in any way.

ccicero commented 1 year ago

could only logged in users see those links to get to GeoLocate? I'm just trying to streamline the process for fixing those 'bad' georeferences.

dustymc commented 1 year ago

only logged in users see

I think that might be outright magic!

streamline the process

Gotcha. I can help with a search/reports/sql/whatever - prioritizing https://github.com/ArctosDB/arctos/issues/4995 might help with that, at least for those with asserted geography.

cjconroy commented 1 year ago

Here is another crazy georef by bot. I found this by mapping all of a certain subspecies of mole that should be generally south of the SF delta. Note the point in the northeast. It should be in with the bulk of the others near San Jose, CA. However, it plots to really the middle of nowhere. The locality is on a UC Reserve. "main road near barn, Blue Oak Ranch Reserve, Santa Clara County, California". However, it plots near the end of "Oak Ranch Road" in Sierra County. At least this one was very obvious. The bot georeffed localities are sometimes pretty close, but also sometimes far off like this.

Screenshot 2023-03-01 at 4 36 00 PM Screenshot 2023-03-01 at 4 35 36 PM
cjconroy commented 1 year ago

How do I find all MVZ mammals with georeference protocol = automated georeference? I've turned on all locality options and does not seem to be there.

cjconroy commented 1 year ago

I can't find georeference protocol on the search for locality page either. I think I must be missing something.

Screenshot 2023-03-01 at 4 54 24 PM
Nicole-Ridgwell-NMMNHS commented 1 year ago

Its under locality attributes. Attribute type = georeference protocol, Attribute value = Automatically created by Arctos services. OR Attribute determiner = georeference bot

cjconroy commented 1 year ago

Sorry, I still cannot find it. Attached are screen shots of my main search page and locality search pages with drop downs under locality attributes. There is no georeference protocol. Can you post a picture?

Screenshot 2023-03-02 at 8 55 26 AM Screenshot 2023-03-02 at 8 54 53 AM
Nicole-Ridgwell-NMMNHS commented 1 year ago

Sorry, I got protocol and source mixed up. Georeference source is what you need.

cjconroy commented 1 year ago

Thanks, found it. I thought it would be something in a drop down menu.

cjconroy commented 1 year ago

Here is another bad bot georef. This is another one that should have gone to the Blue Oak Ranch Reserve near San Jose, CA, but like the example above, plots to the wrong county. This is LocID 10114917, MVZ:Mamm:218705. It should at minimum gone to the correct county, then ideally to the Blue Oak Ranch Reserve, but there probably is no polygon for that.

Dark Canyon Creek Blue Oak Ranch
jldunnum commented 1 year ago

I applaud Chris for working through these. Something I have not yet been able to make time for. Disturbs me to think about the poor georeference data we now have in our collection. Returning to my original thoughts on this, no data are far better than bad data. I would prefer all of our automagiced georeferences were automagiced back. We were working our way through the non georeferenced records prior to this.


Jonathan L. Dunnum Ph.D. (he, him, his) Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351

Chair, Systematic Collections Committee, American Society of Mammalogists Latin American Fellowship Committee, ASM

MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals

Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131


From: cjconroy @.> Sent: Thursday, March 2, 2023 10:20 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Comment @.> Subject: Re: [ArctosDB/arctos] automagic georeferences (Issue #4916)

[EXTERNAL]

Here is another bad bot georef. This is another one that should have gone to the Blue Oak Ranch Reserve near San Jose, CA, but like the example above, plots to the wrong county. This is LocID 10114917, MVZ:Mamm:218705. It should at minimum gone to the correct county, then ideally to the Blue Oak Ranch Reserve, but there probably is no polygon for that. [Dark Canyon Creek]https://user-images.githubusercontent.com/5749672/222503678-ef849b9a-ebe5-4d7a-91d6-b0df44fc33b5.png [Blue Oak Ranch]https://user-images.githubusercontent.com/5749672/222503684-ba235249-3a94-4d90-8f2c-9a4088d96a3f.png

— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/4916#issuecomment-1452236175, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA22TI6HD5VULUCUP5LW2DJFBANCNFSM56A7RQ7Q. You are receiving this because you commented.Message ID: @.***>

cjconroy commented 1 year ago

I'm only throwing out a few examples. I also have not set aside the workforce it will take to correct these. More than 11,000 MVZ mammals were autogeoreffed and need to be reviewed. This is after a report found more than 25,000 MVZ mammals with mismatch in higher geography between georef and stated locality. It is going to take multiple lifetimes to fix these. The autogeoref is only making the pile a lot taller.

ebraker commented 1 year ago

Just joining the conversation since I hadn't seen this thread. I'm wary of the fact that georeference_ protocol can easily be left out when people download results due to the results form customization option. If someone selects lat and long coordinate fields, datum and error, it then looks like all associated coordinate data are curatorially asserted since the "automated georeference" remark is only housed in under the georeference_protocol field. They will miss the fact that it was a bot.

ebraker commented 1 year ago

There's a lot of concerns voiced here, and the ability to verify coordinates is not feasible for me anytime soon, and I assume for most other folks since it is such a gargantuan task (honestly could be years). I am extremely concerned we are pushing a lot of inaccurate data out there that is extremely difficult for a user to discern that these are not curatorially asserted. I don't think we should be publishing anything or pushing to aggregators anything that is not verified.

jldunnum commented 1 year ago

I second Emilys comments!


Jonathan L. Dunnum Ph.D. (he, him, his) Senior Collection Manager Division of Mammals, Museum of Southwestern Biology University of New Mexico Albuquerque, NM 87131 (505) 277-9262 Fax (505) 277-1351

Chair, Systematic Collections Committee, American Society of Mammalogists Latin American Fellowship Committee, ASM

MSB Mammals website: http://www.msb.unm.edu/mammals/index.html Facebook: http://www.facebook.com/MSBDivisionofMammals

Shipping Address: Museum of Southwestern Biology Division of Mammals University of New Mexico CERIA Bldg 83, Room 204 Albuquerque, NM 87131


From: Emily Braker @.> Sent: Monday, April 10, 2023 11:29 AM To: ArctosDB/arctos @.> Cc: Jonathan Dunnum @.>; Comment @.> Subject: Re: [ArctosDB/arctos] automagic georeferences (Issue #4916)

[EXTERNAL]

There's a lot of concerns voiced here, and the ability to verify coordinates is not feasible for me anytime soon, and I assume for most other folks since it is such a gargantuan task (honestly could be years). I am extremely concerned we are pushing a lot of inaccurate data out there that is extremely difficult for a user to discern that these are not curatorially asserted.

— Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/4916#issuecomment-1502090834, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AED2PA5RZ5SYZPCHDBGADYDXAQ7NZANCNFSM56A7RQ7Q. You are receiving this because you commented.Message ID: @.***>

cjconroy commented 1 year ago

third

dustymc commented 1 year ago

Is there a way to opt out

Bot to automatically georeference localities which have no georeference nor locality attribute of type georeference source. This bot needs, and should receive, no additional permissions.

(And I think it'll ignore named localities too.)

suggested_lat and suggested_long

We've had that for years (since biogeomancer).

Screenshot 2023-04-10 at 11 55 56 AM
Jegelewicz commented 1 year ago

is there a reason this is closed? It seems like there isn't a resolution..

Also, and I think I have said this elsewhere, but I can't find it - we now have two different places for designating georeference source. If we only had one, maybe the issue @ebraker brings up would be moot?

dustymc commented 1 year ago

If we only had one,

That's one of the reasons I've been trying to prioritize https://github.com/ArctosDB/arctos/issues/5193 for about as long as I can remember....

ebraker commented 1 year ago

I understand that there is not currently a way to turn the bot off, but I'm asking if we can make it so (default off, with people opting in). Reading back through this thread there was a suggestion by Dusty:

I suppose there could be some sort of 'go away automation'

AND/OR I am suggesting an overt field title that specifically states "automated_lat" "automated_long" or "bot/suggested/etc" so that it is separate from our existing lat/long fields and makes it entirely clear to an unsuspecting user that we as data providers may have never approved nor set eyes on this information. I strongly feel that this is unauthorized data that is so easily integrated in datasets, especially when the results form is customizable and "automated georeference" falls away. It needs to be hardcoded into field name. I think lat/long should be reserved for verified data (or at least data that a human operator approved to load into Arctos). There could be a workflow - once an operator has reviewed the bot coordinates, they move out of automated_lat/long and into designated coordinate fields.

we now have two different places for designating georeference source. If we only had one, maybe the issue @ebraker brings up would be moot?

It would be an improvement but it still doesn't help the core issue here.

I still advocate for the solutions provided above, but there's 'match type' and 'scores' available from the bot (which aren't defined anywhere) that I assume conveys fitness - perhaps we can implement a very high threshold, eg. only >95% confidence are appended to records.

edited: To be clear, I think automated georeferencing is a cool feature - it is just so imperfect with complex localities, aquatic localities, etc. and we as data providers are considered to be authorities that serve up accurate information. Since no one else is doing this sort of automation, our audiences are very unlikely to even know to interrogate provided coordinate data further.

Nicole-Ridgwell-NMMNHS commented 1 year ago

especially when the results form is customizable and "automated georeference" falls away

Georeference protocol isn't even visible by default.

ebraker commented 1 year ago

@mkoo when does the geography committee meet next? I find this issue super distressing and I'm going to be out of town (presenting on Arctos!) next Issues meeting.

campmlc commented 1 year ago

See also #6151

wellerjes commented 1 year ago

My issue with "Bot to automatically georeference localities which have no georeference nor locality attribute of type georeference source" is that we have had individuals go through each locality and manually verify that something is not specific enough, or there's something inconsistent about the verbatim data, and we have evaluated it as unable to be accurately georeferenced, but the bot doesn't recognize that. I've been finding catalog records that have a comment attached to locality remarks that say "cannot georeference due to xyz", but then the bot has made a guess anyway.

Example: https://arctos.database.museum/guid/CHAS:Herb:1880.9.200 image

jldunnum commented 1 year ago

OK so what are the options here? I think it is very clear that all collection managers who have weighed in are in agreement that this experiment did not work and the bot has resulted in major georeferencing errors across our collections. Can we revert to prebot values?

dustymc commented 1 year ago

@jldunnum you requested that in another issue, I'm working on it.