EDIorg / ECC

ECC = EML Congruence Checker
5 stars 0 forks source link

dataLoadStatus and dateTimes #30

Open mobb opened 4 years ago

mobb commented 4 years ago

Background: Our list of preferred dateTimeFormatString is based on ISO

there are datetime strings in this list that pg does not understand. So a plain to_timestamp('2018-08-08 09:00-08','' does not always work.

example dataset: https://portal-s.edirepository.org/nis/reportviewer?packageid=knb-lter-ble.9.1 with the dateTimeFormatString = YYYY-MM-DDThh:mm-hh The last -hh is the offset to UTC, and is correct. the error you get is

ERROR: conflicting values for "mm" field in formatting string Detail: This value contradicts a previous setting for the same field type.

To see what strings posgtres allows, see https://www.postgresqltutorial.com/postgresql-to_timestamp/

mobb commented 4 years ago

To test what strings postgres handles, we used the pg-gui:

select to_timestamp('2018-08-08 09:00-08','YYYY-MM-DDThh:mm-hh' 

datasets with dateTimeFomratStrings we may want to test further ble.9.1 sbc.5001.8

mobb commented 4 years ago

No solution right now. Handling all dateTimes in the preferred list could be not worth the effort. but the dataLoadStatus check is valuable for checking typing. (see #25)

mobb commented 4 years ago

also, the error msg from that check could be friendlier, although that would have to be customize for different types of failures.

In the case of ble.9.1, here is Mark's response: it appears that the warning message is the result of PASTA's inability to transform your datetime format into one that is acceptable by PostgreSQL. PASTA uses PostgreSQL to validate datatypes during the quality check, and the large variety of datetime/timestamp formats are quite difficult to support in this manner. Until a better type checker is implemented, this warning will likely persist. Sorry.

Maybe a general response from dataLoadStatus could be prefixed, e.g.: PASTA uses PostgreSQL in the dataLoadStatus quality check. One purpose is data type checking. Something about your data table caused this step to fail. The error message was ...

we want to keep something like dataLoadStatus, however. People have asked for checks to confirm bounds and coverage, and loading into something with functions like min/max/mean would be necessary for that.