Closed rukayaj closed 6 months ago
From discussion on 10/11 Oct:
We should rewrite this so it uses function calling in two conversations:
System message: You are an expert herbarium label transcription system ... write out the verbatim Darwin Core terms (and the DwC terms which don't have verbatim alternatives) that it finds. [This could be a function call?]
System message: Here are some Darwin Core terms. Can you run some sanity checks and extract some new terms from the verbatim terms, if they have values:
register_dwc
taking a dict argument, with keys as verbatim dwc terms. The actual datasets issues are fixed now and I'm working on the sanity checks, so I'm going to close this now.
Some records have a value for the maximum elevation greater than the value for the minimum elevation. For example, in this record: https://www.gbif.org/occurrence/3924429834, the maximum elevation provided is 0 and the minimum is 160. See the list of records concerned here: https://www.gbif.org/occurrence/search?dataset_key=d4b0f477-0ddf-4c47-a1fe-a7ffed28788e&issue=ELEVATION_MIN_MAX_SWAPPED
Nine records of the same dataset (DOI10.15468/ntdjg9) are flagged because the dates provided are unlikely. For example, the year provided for this record https://www.gbif.org/occurrence/3924428991 is "19", it is likely missing a century or a decade. See all the records flagged here: https://www.gbif.org/occurrence/search?dataset_key=d4b0f477-0ddf-4c47-a1fe-a7ffed28788e&issue=RECORDED_DATE_UNLIKELY
Some records from the Khatlon Scientific Center dataset (DOI10.15468/q2y8b2) have invalid coordinates. For example, the longitude for this record is https://www.gbif.org/occurrence/4166474307 "н4.77" which our system cannot interpret. See the list of the 12 records concerned here: https://www.gbif.org/occurrence/search?dataset_key=4929d4f6-ea8a-40cc-ab3d-0e3a9da01a45&issue=COORDINATE_INVALID.