CatalogueOfLife / data

Repository for COL content
7 stars 2 forks source link

Update unparsed metadata #272

Open mdoering opened 3 years ago

mdoering commented 3 years ago

Some COL sources still have concatenated names in their metadata. Update the sources, patches, releases, coldp /acef/dwca archives - whatever is necessary:

A person object is expressed as follows in the postgres view below: (given, family, email, orcid)

For example: (Karl, Marx, karl@kapital.de,)

If a comma is part of the value it will be quoted: ("Karl,Michael", Marx, karl@kapital.de,)

email as last name:

 key  |         alias         |                                contact                                 
------+-----------------------+------------------------------------------------------------------------
 1011 | FLOW                  | (,bourgoin@mnhn.fr,,)
 1014 | ICTV MSL              | (,info@ictvonline.org,,)
 1103 | Strepsiptera Database | (,jeyaraney.kathirithamby@zoo.ox.ac.uk,,)
 1142 | The White-Files       | (,ouvrard@mnhn.fr,,)
 1174 | PaleoBioDB            | (,"Matthew Clapham <sec@paleobiodb.org>, Mark Uhen <muhen@gmu.edu>",,)
 2144 | ITIS                  | (,itiswebmaster@itis.gov,,)

Concatenated names in single contacts family name:

 key  |       alias        |                                contact                                 
------+--------------------+------------------------------------------------------------------------
 1018 | LepIndex           | (,"Adrian Hine, Natural History Museum",,)
 1022 | Parhost            | (,"S Medvedev, A Lobanov",,)
 1030 | TicksBase          | (,"AM Nijhof, AA Guglielmone",,)
 1120 | FADA Ephemeroptera | (,"H. Barber-James, M. Sartori ,  J.L. Gattolliat",,)
 1138 | FADA Cladocera     | (,"A. Kotov,L.  Forró, N.M. Korovchinsky , A. Petrusek",,)

Concatenated names in author family name:

 key  |       alias        |                    authors                     
------+--------------------+------------------------------------------------
 1040 | AnnonBase          | {"(\"(eds)\",\"Rainer H. & Chatrou L.W.\",,)"}
 1099 | WoRMS_Oligochaeta  | {"(,\"Timm, T. & Erséus, C.\",,)"}
 1108 | WoRMS_Brachyura    | {"(,\"Ng, P. K. L. & Davie, P.\",,)"}
 1127 | WoRMS_Cestoda      | {"(,\"Bray, R. & Tyler, S.\",,)"}
 1128 | WoRMS_Trematoda    | {"(,\"Cribb, T. & Gibson, D.\",,)"}
 1132 | WoRMS_Chaetognatha | {"(,\"Thuesen, E.V. & Pierrot-Bults, A.\",,)"}
 1180 | WoRMS_Ctenophora   | {"(,\"Mills, C.E. Internet\",,)"}
camiplata commented 1 year ago

@mdoering We check all datasets, and the metadata problems are no longer there for all but one: LepIndex (1018). Can you please give us access to LepIndex to fix the issue.

mdoering commented 1 year ago

I added both of you as editors to LepIndex. It has been replaced by the new dataset, so it is not a COL source any longer. Still it would be nice to have its metadata fixed.

DianRHR commented 1 year ago

I made a small correction on contact name. No more problems detected in the metadata of datasets you mentioned in your fist comment.