Open peterdesmet opened 9 years ago
The pretty useless process if we just use GBIF issues:
IF issue CONTAINS (
RECORDED_DATE_INVALID
RECORDED_DATE_MISMATCH
RECORDED_DATE_UNLIKELY
)
THEN category="Date with issues"
ELSEIF eventDate != ""
THEN category="Valuable date (all in ISO8601)"
ELSE
THEN category="Date not provided" /* This is just incorrect! See issue #27 */
@bartaelterman, @niconoe, I need your feedback on this issue:
verbatim.txt
to get a useful eventDate
(as GBIF overwrites them without warning in occurrence.txt
, see #27 - need to confirm with them that no field in occurrence.txt
has the original eventDate
). If so, how challenging is it to loop over that file too?
Description
For a given dataset, I want to know how many records have dates. I also want to know how many of those are useful, have issues, and maybe what their precision is. I envision this as a bar chart, where the records are grouped in categories based on the quality of the dates.
Categories (in order of increasing data quality)
Questions
RECORDED_DATE_MISMATCH
ifday
,year
,month
are correctly provided.RECORDED_DATE_UNLIKELY
also matches invalid dates:99 XXX 9999
RECORDED_DATE_INVALID
,RECORDED_DATE_MISMATCH
andRECORDED_DATE_MISMATCH
are of limited use to indicate the data quality. One approach to provide much more relevant date information, is to use the Canadensys Narwhal Processor.eventDate
is provided, GBIF doesn't seem to look in verbatimEventDate or year, month, day. The literal values of those fields are shown on the website though.Terms we need
Process