VertNet / bels

Biodiversity Enhanced Location Services
Apache License 2.0
17 stars 1 forks source link

Question #63

Closed samleeflang closed 1 year ago

samleeflang commented 1 year ago

Hi everyone,

This is more a question than an issue.

For DiSSCo we are looking at georeferencing tools for specimen data. We have a lot of data which only contain a locality string, maybe a country name or code but nothing else. For these specimens we hope to georeference them based on their locality string. We did a couple tries with the Bels API but had little success.

A couple of examples: curl -X POST -H "Content-Type: application/json" -d '{"give_me": "BEST_GEOREF", "for_location": {"ID":"1", "continent":"Europe", "countrycode":"FI", "locality":"Rauhalinna"}}' https://localityservice.uc.r.appspot.com/api/bestgeoref But even relatively easy strings let's say: curl -X POST -H "Content-Type: application/json" -d '{"give_me": "BEST_GEOREF", "for_location": {"ID":"1", "continent":"Europe", "countrycode":"NL", "locality":"Rotterdam"}}' https://localityservice.uc.r.appspot.com/api/bestgeoref

We tried a bunch of specimen, also ones from which we know the locality string is in GBIF, but to no avail. Could you give us some pointers in what we are doing wrong? Is the tool suited for locality string or is it better to use it another way?

Any help will be much appreciated!

Regards, Sam

tucotuco commented 1 year ago

@samleeflang Sorry for the delay. I looked into the back-end data for the specific examples you provided and there is no matching string in BELS for either of them, so the results are as expected given what is in BELS. There are 42 distinct records that have "Rauhalinna" in them, but not the combination "firauhalinna", which is how the first input would be interpreted. BELS does not try to match the locality alone, it tries to match the entire input after a lot of simplifications of that input. One example is simplification is the remove of the continent from consideration. We found it was more of an obstacle than an aid. Another simplification is to interpret all country input to an ISO country code. There are lots of others. But the specific reason no match came back from BELS is that the combination you provided is not found in the GBIF snapshot used in BELS today (from 2022-07-14), nor in the other sources used to build BELS. The same is true for the Rotterdam example. There are 2264 records with "rotterdam" in them, but none that is "nlrotterdam". For locations as simple as these you have shared, with named places only, it should be possible to entirely automate the georeferencing with GEOLocate, assuming that the features are in the sources for that tool. I hope that helps to understand what is going on at least. Let me know if there is any other way in which I can help.

samleeflang commented 1 year ago

Hi John,

Thanks for the explanation! If I understand you correctly, you would suggest to use GEOLocate for the relative simple locality strings? Would BELS be better suited for more complex locality strings, in the hope they match already georeferenced records?

Kinds regards, Sam

tucotuco commented 1 year ago

I would actually use BELS in bulk via the web app as a first pass, then send whatever did not get results through GEOLocate to automatically georeference everything it can. Then start with the manual process on everything else with a combination of: Georeferencing Best Practices Georeferencing Quick Reference Guide Georeferencing Calculator Manual GeoPick or its alternate instance Google Earth Google Maps Wikipedia Google Search etc.

samleeflang commented 1 year ago

That's very helpful, thanks. We probably build something easy to pilot which follows this flow. BELS -> GEOLocate -> Nominatim (at the back of GeoPick, based on OSM) -> indicate that a manual action is required (with links to the guides). As we first want to pilot it we will do single records or maybe some batches. Before we run large batches we first want to contact individual service providers to ensure that they can handle it. The results will become annotations on the specimen, with the idea that it will be reviewed by a human before we update the actual specimen information.