Open sat01a opened 11 months ago
Another example has been shared in this record: https://biocache.ala.org.au/occurrences/f6ee114b-1d76-4c4e-93f1-2becfa1e5ef4
The Recorded by
field is supplied as Petra Holland
but our processed value is Petra, Petra
.
Due to the large variety of delimiters, abbreviations and name formats in use by data providers, parsing Recorded By is unnecessarily difficult. Putting this into the backlog for now. When there is time it would be worth including a review of all records with unprocessed Recorded By.
My preference is to remove the processed version
@peggynewman as discussed, using the raw_recordedBy
as recordedBy
. Pull request https://github.com/gbif/pipelines/pull/987
To test that this has been applied, https://biocache-test.ala.org.au/fields?filter=recordedBy lists no raw_recordedBy
.
A user's reported an issue to do with processing the "Recorded by" field. This record appears to be correct: https://biocache.ala.org.au/occurrences/6536f255-a49e-45b3-ba3c-83c62862102d It shows Thomas Mesaglio as the original and [Mesaglio, Thomas] as the processed value. Whereas on this record: https://biocache.ala.org.au/occurrences/04e8e8dd-c8ff-497a-ad48-93a631b373f6 It shows Louis Gerald O'Neill as original and the processed value is blank.
Data team investigation suggests it might be due to the apostrophe, or due to the name having more than 2 parts. It's also believed to be an issue in pipelines specifically.
3,876,171 records have provided recordedBy but don’t have processed value, so this is a very visible issue.
Raised in https://support.ehelp.edu.au/a/tickets/182572
Reported by @timhicks-ala