gbif / doc-georeferencing-best-practices

This publication provides guidelines to the best practice for georeferencing. Though it is targeted specifically at biological occurrence data, the concepts and methods presented here can be applied in other disciplines where spatial interpretation of location is of interest.
https://doi.org/10.15468/doc-gg7h-s853
Other
3 stars 1 forks source link

3.1.1 Localities standardization #37

Closed RicardoOrtizG closed 3 years ago

RicardoOrtizG commented 3 years ago

In section 2.2 localities there are some good guidelines for documentation of localities, but I see these are emphasized on the field documentation or digitalization to a database, I think that there is no recommendation for the standardization of localities for the georeferencing process. From SiB Colombia we divide the general georeference workflow into two phases, localities standardization and georeference perse (adding coordinates), the first one allows us to reduce the number of localities to georeference and to integrate localities from different sources in a georeference project, in this process some misspelling, redundancy, and wrong documentation are corrected in order to get a later good classification for localities.

For example, localities documented only with county information is adjusted, taken the values to the respective DwC element, and the locality leaves empty. With tools like OpenRefine we look for clusters of the same locality described in different ways or maybe with some misspelling errors and correct them.

¿There is a reason to consider this is no longer necessary or efficient the localities standardization to the georeference process?, otherwise, I suggest to include some recommendations in the document could be helpful, we have documented some here

ArthurChapman commented 3 years ago

I think we have covered this in §3.1 Georeferencing process and specifically in §3.1.1. In §2.2 we are describing how to record and document a locality - the georeferencing process is covered later in the document.

tucotuco commented 3 years ago

I think Ricardo is right that we neglect the type of standardization he is talking about. We talk about standardizing higher geography.

I know that CONABIO also used the standardization of specific locality strings in their georeferencing workflows, but I was not able to get confirmation from them that it aided efficiency. It would be good if Ricardo could confirm that it does. I know that I recommended against it for use in BioGeomancer, because BioGeomancer could generally figure them out without the standardization and provide the georeferences automatically anyway. So, if we do include something about this detail, we will have to say that the efficiency gains have not been demonstrated unless someone actually has that information. The higher geography standardization is relatively easy by comparison, and proffers the advantage of making it easy to divide up the work to be done into geographical-based packages.

On Sun, Nov 22, 2020 at 8:51 PM Arthur Chapman notifications@github.com wrote:

I think we have covered this in §3.1 Georeferencing process and specifically in §3.1.1. In §2.2 we are describing how to record and document a locality - the georeferencing process is covered later in the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gbif/doc-georeferencing-best-practices/issues/37#issuecomment-731868373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ723RCJVLHFO455I5D3TSRGPX5ANCNFSM4TFDMK5Q .

tucotuco commented 3 years ago

Addressed in a979e8eea4dab08bc873b43fa7665aeeb6c3dedf and 5295ad5089c9fad88ff7c99079fed028365d797c

RicardoOrtizG commented 3 years ago

I'm very sorry to answer so late, unfortunately in the process we made in the past we didn´t take time of georeferencing with and without locality standardization, but for example, with a process like clusters in OpenRefine, the number of unique localities definitely reduces. The same happens when the localities were cleaned by higher geography, elevation, or even coordinates information that should not be in the locality. I understand the decision that John takes, but I want to know what kind of information could confirm that?, maybe I have it but need to processit. I 'm thinking in some examples of number of unique localities before and after the standardization. That could work?

And I wonder if you can change the citation to SiB Colombia and not SIB Colombia. Thanks a lot!.