Open adrik29 opened 4 years ago
Thanks @adrik29, all good questions that others have asked. We'll move these answers to taxonworks_doc at some point.
Managing geographic areas is hard, maybe as hard a taxon names. There is a great deal of homonymy, synonym etc. We build the GeographicArea according to a strictly algorithmic approach, i.e. no individual human curation is allowed. This is to prevent inconsistencies across broad scales to be introduced, it comes at the cost of things like "duplicate" records (but see below). The algorithm has the following principles:
A C
is not the same as A B C
. This ensure we we prevent accidental homonymy at the cost of replication.Yes. We can determine based on the parent hierarchy, if available (note it might not be comprehensive) and spatial calculation where the entity might fit (if the entity has a shape). within the hierarchy.
One with a shape. The TaxonWorks approach is get people to think spatially as a priority. We can re-calculate string names when countries change if we have shapes, we can not recalculate shapes if we have string names. As soon as we bind our data to spatial assertions we have made a huge leap forward in the long-term use of our data. Additionally:
In fact, if you assign a point georeference you need not assign a GeographicArea as it will automatically fill in country, state, etc. string names in the DWC index. GeographicAreas do, however, prevent you from assigning point, or spatial georeferences outside the shape assigned to that GeographicArea, i.e. they act as a validation measure.
GeographicAreas are intended to cover the world until the second (third if countries are included) level of geopolitical subdivision. For example to counties (country, state, county) in the United States.
Open an issue here. The issue should
Ultimately yes, this will be the "Gazeteer" model.
Thanks a lot, Matt! I'm still looking for the many ways to show areas:
(1) we can list them in the Geographic areas module. https://sfg.taxonworks.org/geographic_areas/33651
(2) Areas attributed to a taxon also appear in the browse OTUs module, which is called Browse taxa in TW: https://sfg.taxonworks.org/tasks/otus/browse/557353#asserted-distributions
(3) One record at a time is shown in the "related data" in OTUs module: https://sfg.taxonworks.org/asserted_distributions/173106
(4) And we can have combined distributions for supraspecific taxa: https://sfg.taxonworks.org/tasks/gis/otu_distribution_data?taxon_name_id=326135
This is more or less what I could gather.
As far as I understand, TW uses a controlled set of toponyms.
(1) Some are duplicated, I don't know exactly why. It seems that it's because they came from different sources. Which one should I use? only "Brazil", or "South America: Brazil" for example?
(2) I don't know if they "cascade", that is, if they propagate upwards when used in "Asserted distribution". For example, if I mark species Aus bus as recorded from Clatsop County, Oregon, USA, should I expect that Aus bus name pops up when I list related data from Oregon or from USA as well?
(3) How can I notify missing areas? I'm not talking only of crazy countries that change names or are newly formed, but of traditional provinces that are simply missing from our list of controlled terms. For example: https://en.wikipedia.org/wiki/Department_of_Puno
(4) For huge and megadiverse countries such as Brazil, shouldn't we include also 2nd order administrative divisions such as done for the USA?