Closed adam-collins closed 6 months ago
draft issue by @peggynewman
@peggynewman @CamSlayter I have done a short analysis of the issue, responded to questions and require feedback.
dwcHeadings=true
because the DWC term for ALA fields scientificName and raw_scientificName is the same. This is less of a problem for other raw DWC fields because we do not identify them as such. dwcHeadings=true
is currently used in biocache.ala.org.au Full Darwin Core
and Customised download
occurrence downloads. headings.csv
provides a mapping between requested field
and column name
so that there is a method to compare what is requested to what was returned. Note that downloading through biocache.ala.org.au UI is not the only way to download records. e.g. A user does a download through the UI and wants do download a subset of fields through the API and to do this they can make use of headings.csv
to identify the requested field
for the column name
of interest. The values are not in all instances identical.headings.csv
provides a mapping between 'DwC Nameand
column name`. The values are not in all instances identical.
This is of use in at least 2 instances:
column name
is a DwC term.column name
belongs to when dwcHeadings=true
is not specified.headings.csv
provides a mapping between Field name
and column name
. This can be of use to identify what biocache search field matches column name
and this is useful when constructing queries for the UI or API. The values are not in all instances identical.citations.csv
and README.html
. Should we remove them?raw_
that is consistent with the ALA field naming convention, e.g. dwcHeadings=true
produces the column name raw_scientificName
when the requested field is raw_scientificName
.Hi Adam
Apologies if I am misunderstanding your email but there may be a definitional issue here when the decision was taken to reclassify the field.
If we need to reclassify raw_ScientificName to be compliant with DwC, then the correct DwC field is verbatimIdentification (A string representing the taxonomic identification as it appeared in the original record) - https://dwc.tdwg.org/terms/#identification not scientificName (The full scientific name, with authorship and date information if known. When forming part of a dwc:Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the dwc:identificationQualifier term.) - https://dwc.tdwg.org/terms/#dwc:scientificName
Without understanding any of programming issues, would it be possible to reclassify raw_ScientificName as verbatimIdentification?
Very happy to discuss with you and Peggy?
Best wishes Cam
From: adam-collins @.> Sent: Thursday, November 9, 2023 11:17 AM To: AtlasOfLivingAustralia/biocache-service @.> Cc: Slatyer, Cam (NCMI, Black Mountain) @.>; Mention @.> Subject: Re: [AtlasOfLivingAustralia/biocache-service] Downloads have 2 scientificName fields (Issue #856)
@peggynewmanhttps://github.com/peggynewman @CamSlayter I have done a short analysis of the issue, responded to questions and require feedback.
— Reply to this email directly, view it on GitHubhttps://github.com/AtlasOfLivingAustralia/biocache-service/issues/856#issuecomment-1802963729, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5GVKFIRV36KKA2LFYANZLLYDQOGBAVCNFSM6AAAAAA7DVSSBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSHE3DGNZSHE. You are receiving this because you were mentioned.Message ID: @.***>
To assist in clearing up the misunderstanding, we accept DwC fields from providers and may apply processing. We retain the provided DwC field value and present it on occurrence pages beneath the processed value as well as providing a button compare "original vs processed" values
e.g. https://biocache.ala.org.au/occurrence/798c62b1-9196-49f2-8fc4-c69a59a1d95f.
We will not move a provided https://dwc.tdwg.org/list/#dwc_scientificName value into http://rs.tdwg.org/dwc/terms/verbatimIdentification as it may replace a provided verbatimIdentification and removes the ability to identify the provided scientificName.
Agree with Adam. We don't move values from one DwC term to another (eg scientificName to identification). Theoretically the data provider (or the data team) puts all the terms into the system, and the system only maps from raw_
values to processed values.
Thanks for the long explanation Adam. Do we want to make big changes to Downloads? My take on this would be that Downloads should export DwCA or frictionless data packages, and that we should have a data dictionary containing all of our terms with URIs and descriptions (biocache fields would probably be fine) which would be included in the meta.xml. Users should be able to specify clearly whether they want raw and/or interpreted fields (selecting 'Miscellaneous' isn't clear).
I just did a download specifying 'Miscellaneous' to get the raw values and can't find raw_scientificName. The scientificName
field is called species
for some reason but raw_species
is empty.
Applying the prefix raw_
on all raw fields in a download when dwcHeaders=true
. e.g. raw_scientificName
.
Pull request https://github.com/AtlasOfLivingAustralia/biocache-service/pull/858
Hi Adam - I ran a download earlier this week and the fields are still identical - was this implemented today? Otherwise I think it needs to be kept open. Best wishes Cam
Since there is confusion, sure, why not.
Created a milestone for a release, did a release, available for anyone today, closed the milestone for the release. tada.
Files in downloads have two fields called
scientificName
See for examples
https://doi.ala.org.au/doi/50c278f4-d540-4d07-967a-3122883ab50e https://doi.ala.org.au/doi/01b0738d-272c-4171-a63f-476b06760691
Digging into the headings.csv of the first above (which users are very unlikely to do), one of them is the provided value:
Shouldn't the headings.csv just contain the
Download field name
,Download field description
andMore Information
?Column name
looks useful for searching for fields, but unhelpful in this case.raised by @CamSlatyer