CatalogueOfLife / xcol

Working towards the extended Catalogue of Life Checklist
0 stars 0 forks source link

Find more resources for authorship abbreviations #132

Open mdoering opened 4 months ago

mdoering commented 4 months ago

List readily accessible web resources that map author abbreviations used in names to standard abbreviations or full names.

DianRHR commented 3 months ago

I sent a request to IPNI today asking for their database of authors with their complete names and correspondent abbreviations.

DianRHR commented 1 month ago

Finally I got a long answer from Bob Allkin (IPNI) , besides other comments, he is asking :

  1. "What fields would be useful for us?" In my opinion, these would be useful: id | version | default_author_name | default_author_forename | default_author_surname | standard_form | dates | date_type_string | alternative_abbreviations | alternative_names | taxon_groups | taxon_groups_flat Other available fields are these, but but they probably aren't useful : name_notes | name_source | date_type_code | example_of_name_published | comments | author_iso_countries | suppressed

  2. He pointed that as a living resource constantly being revised and added to, any one-off download will quickly get out of date therefore and it will become inconsistent with the data currently held within IPNI. Then, he asks if we have considered the need for a regular refresh/update to these data and what frequency that might be appropriate/feasible? And if so, if it be manual updates or a more automated / digital API type access? This last case would require upfront development from both parts but would save effort. What are your thoughts/preference? My opinion in this is that we can probably make a first test with their data and how can it contribute to our process before thinking on a more sophisticated development at any side.

@camiplata @mdoering your comments will be very helpful to answer him ...

mdoering commented 1 month ago

Thanks @DianRHR. It is not a priority at this stage I think, but the primary purpose for having those resources would be to compare authorships better when doing name matching.

In that light we only need a standard name and a list of known abbreviations or alternative spellings. The file we currently use is this: https://github.com/CatalogueOfLife/backend/blob/master/api/src/main/resources/authorship/authormap.txt

The other more advanced option for the future would be to curate a proper reference of authors that we can link to. An AuthorIndex like we have a NameIndex now. (TaxonIndex and ReferenceIndex on the horizon too). Such an index would need more properties and it would be best to share data in ColDP which has an Author entity in its recent version.

DianRHR commented 1 month ago

I got the complete updated list of authors of IPNI to use it as reference to compare authors from the merge sectors. standard_form and alternative_abbreviations will be the most usefl fields from this dataset. IPNIauthors.csv