SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
MIT License
87 stars 27 forks source link

Missing geographic areas #1546

Open adrik29 opened 4 years ago

adrik29 commented 4 years ago

As far as I understand, TW uses a controlled set of toponyms.

(1) Some are duplicated, I don't know exactly why. It seems that it's because they came from different sources. Which one should I use? only "Brazil", or "South America: Brazil" for example?

(2) I don't know if they "cascade", that is, if they propagate upwards when used in "Asserted distribution". For example, if I mark species Aus bus as recorded from Clatsop County, Oregon, USA, should I expect that Aus bus name pops up when I list related data from Oregon or from USA as well?

(3) How can I notify missing areas? I'm not talking only of crazy countries that change names or are newly formed, but of traditional provinces that are simply missing from our list of controlled terms. For example: https://en.wikipedia.org/wiki/Department_of_Puno

(4) For huge and megadiverse countries such as Brazil, shouldn't we include also 2nd order administrative divisions such as done for the USA?

mjy commented 4 years ago

Thanks @adrik29, all good questions that others have asked. We'll move these answers to taxonworks_doc at some point.

Why are there duplicate Geographic areas?

Managing geographic areas is hard, maybe as hard a taxon names. There is a great deal of homonymy, synonym etc. We build the GeographicArea according to a strictly algorithmic approach, i.e. no individual human curation is allowed. This is to prevent inconsistencies across broad scales to be introduced, it comes at the cost of things like "duplicate" records (but see below). The algorithm has the following principles:

Does selecting a finer subdivision assign allow us to infer the parent subdivisions (i.e. does assignment "cascade" up).

Yes. We can determine based on the parent hierarchy, if available (note it might not be comprehensive) and spatial calculation where the entity might fit (if the entity has a shape). within the hierarchy.

Which GeogaphicArea should I use?

One with a shape. The TaxonWorks approach is get people to think spatially as a priority. We can re-calculate string names when countries change if we have shapes, we can not recalculate shapes if we have string names. As soon as we bind our data to spatial assertions we have made a huge leap forward in the long-term use of our data. Additionally:

Should I always use a GeographicArea in my CollectingEvent?

In fact, if you assign a point georeference you need not assign a GeographicArea as it will automatically fill in country, state, etc. string names in the DWC index. GeographicAreas do, however, prevent you from assigning point, or spatial georeferences outside the shape assigned to that GeographicArea, i.e. they act as a validation measure.

What is the scope of the GeographicArea table.

GeographicAreas are intended to cover the world until the second (third if countries are included) level of geopolitical subdivision. For example to counties (country, state, county) in the United States.

How do I suggest new gazetteers for missing entities?

Open an issue here. The issue should

Will I be able to edit my own GeographicAreas?

Ultimately yes, this will be the "Gazeteer" model.

adrik29 commented 4 years ago

Thanks a lot, Matt! I'm still looking for the many ways to show areas:

(1) we can list them in the Geographic areas module. https://sfg.taxonworks.org/geographic_areas/33651

(2) Areas attributed to a taxon also appear in the browse OTUs module, which is called Browse taxa in TW: https://sfg.taxonworks.org/tasks/otus/browse/557353#asserted-distributions

(3) One record at a time is shown in the "related data" in OTUs module: https://sfg.taxonworks.org/asserted_distributions/173106

(4) And we can have combined distributions for supraspecific taxa: https://sfg.taxonworks.org/tasks/gis/otu_distribution_data?taxon_name_id=326135

This is more or less what I could gather.