Closed griffie closed 3 years ago
Branch: data-bucket Testing Date: 2021-02-10
Have just done some testing on sample collection date unit
and haven't found any issues with importing, copy-pasting, using the picklist to enter and validate values. All work fine. No matter how I add the date unit, it appears to automatically reformat the sample collection date
with 01
pseudo values (before the validation step). E.g. if I import 2020
it becomes 2020-01-01
when I select year
, or if I paste 2020
it becomes 2020-01-01
even if no unit has been selected.
The only usability concern I have is that if someone is adding sample collection date unit
within the DataHarmonizer, after already having sample collection dates
, they could accidently overwrite values in their sample collection date. E.g. I have 2020-02-18
and then accidently select month
instead of year
the date changes to 2020-02-01
.
I tested importing with all eligible file types using modified (and updated) versions of the validTestData
as well as the test file provided by damion. However, when I did some tests on the modified (and updated) version of the invalidTestData
I noticed 2020
wasn't automatically converting to 2020-01-01
. I tried seeing what would happen if I paired 2020
with year
, month
, and day
and the result was the following:
Not certain why this is happening, but fortunately the validation process will always catch and draw attention to these occurences.
Edit: Input: DH1311p_collection-date-unit_test-05 (invalid data - 2020 testing).csv Output: DH1311p_collection-date-unit_test-05-output (invalid data - 2020 testing).csv
Test Files:
DH Test_2021-02-10 (sample collection date unit).zip
I only saved the output when there were unexpected results, in the future I will include the output regardless of the results.
Edit: "DH" stands for "DataHarmonizer" "DH1311p" stands for "DataHarmonizer version 0.13.11 pre-release"
So I've made a change that when a spreadsheet is loaded, the program will stop trying to automatically correct dates into a yyyy-mm-dd format, e.g. "2020" in a date field was getting converted into 2020-01-01 on load, but now it remains 2020. That way a user will be able to manually adjust any date rather than program making assumptions about what it should be converted to. The values will trigger validation error to highlight ones that need correction.
The reason a "day" setting kept 2020 as-is is I didn't want to make assumptions about setting day and month component of what was only a year.
Similarly for month, its prompting user for month when only a year is given. In that case it assumes day is 01.
Also, we have it that no changes are automatically made any more to month/year/day granularity (did this by renaming the "sample collection date unit" field to "sample collection date precision", since the program still involkes the auto-update on any date + unit field. Instead, any given date is converted to the given date granularity only on export to a particular target database.
I doubled checked this (while testing the CanCOGen-vocabulary-fix branch) and sample collection date precision
combined with sample received date
behaved as you described when imported and exported.
Attachments: DH-Test_2021-02-21 (CNPHI Export - date precision).zip
The program is invoking the auto-update on any date field, not just the date + unit field pairs.
The following fields have the auto-date formatting to ensure there are value for year, month, and day - but they don't have a paired precision date column to clarify that these are not actually dated "YYYY-01-01".
Attachments: DH-Test_2021-02-21 (CanCOGeN vocabulary fix).zip
The date auto-format function (which would be applied to all dates in a loaded spreadsheet on load) has been removed from date fields across the board, so malformed dates remain as is and are only highlighted when one presses "Validate".
For incomplete collection dates (to year, or to month) we need a "Date Unit" field with values "Year", "Month" and "Day". In the validation step, if a collection date only specifies a month or year, the Date Unit field will specify that. Then the DH should automate the filling in the rest of the missing date parts with "01" so that the date can be accepted by downstream programs that require year-month-day (YYYY-MM-DD). In the export file for CNPHI, the Date Unit field should be called Precision. We'll map that once the DH adopts the changes above.