Closed nickdos closed 2 years ago
Tested on NCI test and looks good.
https://doi-test.ala.org.au/doi/10.80416/ala.581115cc-67af-4047-8b33-27cd64ce45c8
Second record has 2 collectors and the recordedBy
field shows: Austin, A.F. | Barnett, A.M.
.
Hi @nickdos - I'll close this one off as being complete historically - recordedBy now appears to contain all collectors in a downloaded set.
@nickdos commented on Thu Sep 23 2021
Reported by a user - https://support.ehelp.edu.au/a/tickets/117223.
See the download at https://doi.ala.org.au/doi/10.26197/ala.c10ba085-7a89-42d7-b899-23af93b75858
Example record is row 5 with UUID
f3bff74c-8966-41c0-b9a1-080b8a78143c
, which shows (column X in CSV):recordedBy: Barnett, A.M.
Looking at the record itself, shows a different value:
raw
version as the parsing that occurs in the processed version is known to be problematic and should only be used for retrieval purposes, not display purposes.@nielsklazenga commented on Thu Sep 23 2021
This is because
collector
used to be a string, but is now a multi-value string. The same thing causes the square brackets around the collectors in the search results. There is also acollectors
field. I am not sure which one is therecordedBy
field, but an API search withrecordedBy
in the field list includes both. I think it would be good if one of them could be a string. If the parsing is the only processing that is done, that could be the provided name string.@brucehyslop commented on Thu Sep 23 2021
The
collector
andcollectors
are both mappings to therecordedBy
(Solr) field. Searches on any of these fields will return the same results.In biocache-service the endoints:
/ws/occurrences/search
returnscollector
andcollectors
multi-value arrays, howeverws/occurrence/{id}
returnsrecordedBy
both raw and processed versions as strings@brucehyslop commented on Fri Sep 24 2021
The change made in PR AtlasOfLivingAustralia/biocache-service#698 will fix the issue of only one (the last) entry of the
recordedBy
multi-value field.Since the downloads fields are passed from
biocache-hub
when triggering the download it relatively easy to resolve via config:raw_recordedBy
to configdownloads.dwcExtraFields
, this will include both raw and processed fields in the download.recordedBy
withraw_recordedBy
in configdownloads.legacy.defaultFields
downloads.dwcExtraFields
will be added@nickdos commented on Fri Sep 24 2021
~DwC download format (list of fields) is dynamically set by parsing the
/ws/index/fields
file and extracting rows withdwcTerm: "XXX",
attribute. So I'd rather not hack in exceptions.~ Forget that - forgot about thedownloads.dwcExtraFields
config option - can use that.@nickdos commented on Fri Sep 24 2021
Updated prod load-balanced hubs and ansible inventories. Tried a download and can see its requesting the
raw_recordedBy
field now:@nielsklazenga commented on Fri Sep 24 2021
My two-cents' worth is that, since
dwc:recordedBy
is a string and therefore the rawrecordedBy
is always a string, the processedrecordedBy
should be a string as well. If, for some internal purpose, it is necessary to index the parsed values, the multi-value field should be given a different name and should not co-opt the Darwin Core term.Also,
recordedBy
,identifiedBy
andgeoreferencedBy
are the same type of field (have the same object / range / target), so I think they should be treated consistently in ALA. Yet,identifiedBy
andgeoreferencedBy
are not multi-value, whilerecordedBy
is.This is probably more something for stage 3 of the infrastructure project (?) and we are in the
biocache-hubs
repository, so I will shut up about this now.