globalbioticinteractions / scan

Symbiota Collections of Arthropods Network (SCAN) Registry
1 stars 0 forks source link

"invalid" dates with 00's #2

Open jhpoelen opened 5 years ago

jhpoelen commented 5 years ago

@neilcobb preliminary integration tests with SCAN show dates with 00's . For indexing, would it be ok to chop dates like:

1896-00-00 -> 1896 1911-12-00 -> 1911-12 1934-00-30 -> 1934

or is it best to leave them out all together ?

example from test report https://travis-ci.org/globalbioticinteractions/scan/jobs/587910909#L432 :

local invalid date string [0000-00-00] local invalid date string [1896-00-00] local invalid date string [1905-00-00] local invalid date string [1908-00-00] local invalid date string [1911-00-00] local invalid date string [1911-12-00] local invalid date string [1912-00-00] local invalid date string [1918-00-00] local invalid date string [1926-06-00] local invalid date string [1933-00-00] local invalid date string [1933-11-00] local invalid date string [1934-00-30]

neilcobb commented 5 years ago

I am about to head out till tomorrow afternoon, I do not know enough about these below to answer your questions

neilcobb commented 5 years ago

I am back and updated all the collections we serve to iDigBio and GBIF if they are registered.

Regarding the dates below if these come from multiple collections in SCAN then I would just leave them out. You do nto happen to have a csv of all the records with invalid date strings?

jhpoelen commented 5 years ago

@neilcobb Am working on providing more details on the records with "invalid" date. Need to make a feature in GloBI's commandline tool https://github.com/globalbioticinteractions/elton . Opened related issue at https://github.com/globalbioticinteractions/elton/issues/11 .

neilcobb commented 5 years ago

Would be good to send records with invalid dates to individual data providers and they can annotate or give permission for a batch process that would make their dates valid. that might mean only showing the year or no date for some records

seltmann commented 5 years ago

Symbiota allows 0s to be added to a valid date string, so this will be a systemic issue in Symbiota datasets because people are used to adding them that way, and it is even encouraged in the present Symbiota documentation.

@neilcobb I suggest indexing those records as @jhpoelen suggested 1896-00-00 -> 1896 1911-12-00 -> 1911-12 1934-00-30 -> 1934

Symbiota documentation

neilcobb commented 5 years ago

ok

jhpoelen commented 5 years ago

@neilcobb @seltmann I've applied Symbiota "-00" date mappings mapping as suggested. Changes should be available in next elton release. Keeping open until that happens.