AtlasOfLivingAustralia / biocache-service

Occurrence & mapping webservices
https://biocache-ws.ala.org.au/ws/
Other
9 stars 26 forks source link

Downloads have 2 scientificName fields #856

Closed adam-collins closed 6 months ago

adam-collins commented 8 months ago

Files in downloads have two fields called scientificName

See for examples

https://doi.ala.org.au/doi/50c278f4-d540-4d07-967a-3122883ab50e https://doi.ala.org.au/doi/01b0738d-272c-4171-a63f-476b06760691

Digging into the headings.csv of the first above (which users are very unlikely to do), one of them is the provided value:

Column name Requested field DwC Name Field name Field description Download field name Download field description More information
scientificName raw_scientificName scientificName raw_scientificName Scientific name (unprocessed) raw_scientificName Scientific Name - original Original scientific name supplied with the record http://rs.tdwg.org/dwc/terms/scientificName
scientificName scientificName scientificName scientificName Scientific name scientificName Scientific Name (intepreted) The name the Atlas has matched this record to in the NSL http://rs.tdwg.org/dwc/terms/scientificName

Shouldn't the headings.csv just contain the Download field name, Download field description and More Information? Column name looks useful for searching for fields, but unhelpful in this case.

raised by @CamSlatyer

adam-collins commented 8 months ago

draft issue by @peggynewman

adam-collins commented 8 months ago

@peggynewman @CamSlayter I have done a short analysis of the issue, responded to questions and require feedback.

CamSlatyer commented 8 months ago

Hi Adam

Apologies if I am misunderstanding your email but there may be a definitional issue here when the decision was taken to reclassify the field.

If we need to reclassify raw_ScientificName to be compliant with DwC, then the correct DwC field is verbatimIdentification (A string representing the taxonomic identification as it appeared in the original record) - https://dwc.tdwg.org/terms/#identification not scientificName (The full scientific name, with authorship and date information if known. When forming part of a dwc:Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the dwc:identificationQualifier term.) - https://dwc.tdwg.org/terms/#dwc:scientificName

Without understanding any of programming issues, would it be possible to reclassify raw_ScientificName as verbatimIdentification?

Very happy to discuss with you and Peggy?

Best wishes Cam

From: adam-collins @.> Sent: Thursday, November 9, 2023 11:17 AM To: AtlasOfLivingAustralia/biocache-service @.> Cc: Slatyer, Cam (NCMI, Black Mountain) @.>; Mention @.> Subject: Re: [AtlasOfLivingAustralia/biocache-service] Downloads have 2 scientificName fields (Issue #856)

@peggynewmanhttps://github.com/peggynewman @CamSlayter I have done a short analysis of the issue, responded to questions and require feedback.

— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/biocache-service/issues/856#issuecomment-1802963729, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5GVKFIRV36KKA2LFYANZLLYDQOGBAVCNFSM6AAAAAA7DVSSBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSHE3DGNZSHE. You are receiving this because you were mentioned.Message ID: @.***>

adam-collins commented 8 months ago

To assist in clearing up the misunderstanding, we accept DwC fields from providers and may apply processing. We retain the provided DwC field value and present it on occurrence pages beneath the processed value as well as providing a button compare "original vs processed" values e.g. https://biocache.ala.org.au/occurrence/798c62b1-9196-49f2-8fc4-c69a59a1d95f.

We will not move a provided https://dwc.tdwg.org/list/#dwc_scientificName value into http://rs.tdwg.org/dwc/terms/verbatimIdentification as it may replace a provided verbatimIdentification and removes the ability to identify the provided scientificName.

peggynewman commented 8 months ago

Agree with Adam. We don't move values from one DwC term to another (eg scientificName to identification). Theoretically the data provider (or the data team) puts all the terms into the system, and the system only maps from raw_ values to processed values.

Thanks for the long explanation Adam. Do we want to make big changes to Downloads? My take on this would be that Downloads should export DwCA or frictionless data packages, and that we should have a data dictionary containing all of our terms with URIs and descriptions (biocache fields would probably be fine) which would be included in the meta.xml. Users should be able to specify clearly whether they want raw and/or interpreted fields (selecting 'Miscellaneous' isn't clear).

I just did a download specifying 'Miscellaneous' to get the raw values and can't find raw_scientificName. The scientificName field is called species for some reason but raw_species is empty.

adam-collins commented 7 months ago

Applying the prefix raw_ on all raw fields in a download when dwcHeaders=true. e.g. raw_scientificName.

Pull request https://github.com/AtlasOfLivingAustralia/biocache-service/pull/858

CamSlatyer commented 6 months ago

Hi Adam - I ran a download earlier this week and the fields are still identical - was this implemented today? Otherwise I think it needs to be kept open. Best wishes Cam

adam-collins commented 6 months ago

Since there is confusion, sure, why not.

Created a milestone for a release, did a release, available for anyone today, closed the milestone for the release. tada.