gbif / crawler

The crawling pieces - ws, cli, coordinator
Apache License 2.0
4 stars 3 forks source link

Because of current BioCASE metadata mapping, dataset owner names aren't included in citation #59

Open ManonGros opened 1 year ago

ManonGros commented 1 year ago

During BioCASE synchronisation dataset owner is mapped to administrative contact only. Administrative contacts aren't included in the dataset auto-generated citation (only originating and metadata authors are: https://www.gbif.org/faq?question=how-is-the-dataset-citation-text-auto-generated). As a result, they aren't in the dataset citation string, see this example: https://www.gbif.org/dataset/e0908eee-ad49-4e91-b4d0-1f05dd17b291#citation

Would it be possible to also map the DatasetDerivations/DatasetDerivation/Rights/LegalOwner/ to ORIGINATOR?

https://github.com/gbif/crawler/blob/bde1c0c9525caff25fe23114ca810018a17f642f/crawler-metasync/src/main/java/org/gbif/crawler/metasync/protocols/biocase/model/abcd12/SimpleAbcd12Metadata.java#L250

MattBlissett commented 1 year ago

This is more complicated than I expected.

Citations are only made for authors with a last name, and BioCASe only provides a single name field. That contains names like Prof. Dr. Angelika Brandt, C. Dilger-Endrulat, Prof. Dr. H. Schubert, Dr. Matthias Nuß.

The registry only has firstName and lastName fields. Currently, the whole name is put into firstName, and contacts without a lastName are ignored when generating citations.

I could instead use lastName, but we'd have citations with the full Prof. Dr. Angelika Brandt etc.

Or I could attempt parsing the names (!), so we'd have citations like Brandt A, Dilger-Endrulat C, Schubert H and Nuß Matthias, but then the titles would also be absent from the contact.

BKlasen commented 1 year ago

Hi Matt, thank you for your efforts. Is it possible to parse the names and copy the parsed names into citation, but keep the original in the contact field?

Birgit (from LIB in Germany, who raised the issue)

ManonGros commented 1 year ago

Hi Matt, have you had the chance to look into parsing names? If it isn't doable, perhaps we shouldn't discard the contact who are missing names?