CatalogueOfLife / data

Repository for COL content
7 stars 2 forks source link

Update Moss Tropicos source #269

Closed mdoering closed 1 year ago

mdoering commented 3 years ago

The MOSS data from Tropicos in COL is from 2004. This is way outdated. We (GBIF) have also received user feedback about the awkward authorship style being used in moss names in COL, which include the year and are rather "zoological style". And are not inline with what Tropicos exposes. For example:

Orthotrichum acuminatum Philibert, 1881 (COL) vs Orthotrichum acuminatum H. Philib. (Tropicos)

TonyRees commented 3 years ago

OK, there are 2 issues here (actually 2.2...)

.. 1: Whether or not to abbreviate the author surname. There is a long history of author surnames in botany being represented by a "standard form" e.g. L. for Linnaeus, etc., but it is not compulsory. The present Code merely says:

46A.2. When a name in an author citation is abbreviated, the abbreviation should be long enough to be distinctive, and should normally end with a consonant that, in the full name, precedes a vowel. The first letters should be given without any omission, but one of the last characteristic consonants of the name may be added when this is customary.... 46A.4. When it is a well-established custom to abridge a name in another manner, it is advisable to conform to custom. Ex. 4. DC. for Augustin-Pyramus de Candolle; St.-Hil. for Saint-Hilaire; Rchb. for H. G. L. Reichenbach.

I note in AlgaeBase (extant+fossil algae) and numerous fossil algal compendia, the convention has been not to abbreviate author surnames, e.g. see https://www.algaebase.org/search/species/detail/?species_id=87 i.e. "Fucus vesiculosus Linnaeus 1753"; Lentin & Williams dinoflagellate index (2017 edition, as represented in the database DINOFLAJ3): http://dinoflaj.smu.ca/dinoflaj3/index.php/Subfamily_Pareodinioideae : Type genus: Pareodinia Deflandre, 1947d; Jurassic (Callovian). Fossil genera: Arkellea Below, 1990; Jurassic (Oxfordian). Gochteodinia Norris, 1978; Late Jurassic. (etc.) However in the Fungi (for example), Species Fungorum does use the long standing "abbreviated" style, e.g. see http://www.speciesfungorum.org/GSD/GSDspecies.asp?RecordID=245934.

1.1 Prepend initials or not, for botanical taxa. This is a mixed bag; botanical practice is not to prepend them for the first instance of an author name (e.g. Smith), but then successively add them for subsequent "Smith" instances (different persons). For IRMNG I decided that this was silly, and I prepend them for all (or nearly all) instances, following the style used by Index Nominum Genericorum (ING) which has, for example: Abdominea J. J. Smith, Bull. Jard. Bot. Buitenzorg ser. 2. 14: 52. Apr 1914. Adenoderris J. Smith, Hist. Filicum 222. 1875. Afzelia J. E. Smith, Trans. Linn. Soc. London 4: 221. 24 Mai 1798 (nom. cons.). (etc.)

1.1.1 Spacing between initials... "Standard forms" from Kew have no spacing, example "J.Sm." (for Smith). I find this odd and like a space between the final full stop and the next word. Then there is the issue with multiple initials as per the ING examples above ("J. E. Smith", etc.). I close up the initials themselves and render this as "J.E. Smith" for IRMNG, but that is more of a "house style" similar to the preferred "reference style" for a journal, and CoL can do what it prefers to do, or support multiple styles as per the incoming data.

2: Append year, or not. "Conventional" botanical usage does not append the year, except in the context of a cited publication, e.g. see the the ING and Species Fungorum examples given above. However in algae (extant and fossil) it is more often that the year is appended, see AlgaeBase and Lentin+Williams examples given. In IRMNG, for genera I decided that the year was valuable and I would keep it where available (generally parsed put from the ING record), for all plant groups, not just Algae. For the same reason I also "like" year for species names and include it where present in the incoming source, e.g. see https://www.irmng.org/aphia.php?p=taxdetails&id=1086407 (algal genus), species list starts: Species Fucus abnormis Stackhouse Species Fucus acicularis Esper, 1800 Species Fucus aculeatus Esper, 1798 Species Fucus acutus Turner Species Fucus albidus Esper, 1800 (I think these are a mix of names from AlgaeBase and elsewhere)

My contention being that there is value in seeing whether a name is long established or relatively new, as well as seeing the date range when an author was active, being able to sort of filter names by year published, and so on; also distinguish homonyms in some rare but not unknown cases (same name, same author, different year).

2.2 Where year is cited, prefix with a comma or not. This is not covered by the ICN, not being a convention in higher plants; algae databases differ (e.g. Lentin+Williams does, AlgaeBase does not), as do zoological databases. For IRMNG I presently use a comma, but this might change one day without loss in data value. E.g. FishBase changed from a comma to no comma during the last 10 years. However my eye does seem to prefer it (possibly influenced early on by CoL data format).

Just my 2 cents, not prescriptive. However I think moving/nudging the "standard botanical" style slowly towards the "zoological style" is no bad thing, with the exception that I like to keep author initials for botanical citations.