CatalogueOfLife / data

Repository for COL content
8 stars 2 forks source link

HTML encoding in scientific names #138

Open aoern opened 4 years ago

aoern commented 4 years ago

@yroskov @gdower

There is HTML encoding and styling in scientific names in June 2020 Edition:

  1. In GlobIS (GART) data: 231401 Tmolus denarius Butler & H. Druce, 1872 234365 Miletus dryone Smith & Kirby GloBIS (GART)

  2. In WoRMS data 194 cases, some examples: 2430341 Sepia rostrata Férussac & d'Orbigny, 1848 [pro parte] WoRMS Mollusca 2723058 Ceratosoma gracillimum Semper in Bergh, 1876 WoRMS Mollusca 2754069 Ranella reticularis (Linnaeus, 1758) sensu Deshayes, 1839 WoRMS Mollusca 2846424 Turbo porphyrites [sic, porphyria] WoRMS Mollusca

gdower commented 4 years ago

Thanks, @aoern! I think these are all issues in the data, so I transferred your issue to the data repo. For the GlobIS data, I'm assuming you're referring to the & HTML entity.

mdoering commented 4 years ago

The backend replaces all html entities and similar problem. But we decided to use the verbatim authorship for exports to the legacy portal...