Closed loripusey closed 2 years ago
Some initial notes:
CSVSerializer.kt
is validating elements in the mapRow
method on line 343 when when it calls element.checkForError
. As far as I can see, HLSerializer.kt
is not utilizing this function in its equivalent (or anywhere in this file).
Also, to answer Lori's question about inheritance, they are not inheriting from a parent Serializer
class that share functions.
The checkForError
method is in the Element.kt
, which means that the validation is being done on the Element
level, it is only being utilized on the Serializer level, which means we can possibly use this function in the HL7Serializer
in the same way
DATE
type is in Element.kt
from lines 406 - 426. It has 3 ways of trying to parse out a date. Each block is a try/catch
block. In other words, it try
s to parse a date, return
s null
if it does, and if it doesn't, it goes into the catch
block and moves on to the next try
block.
a. The first try
block does LocalDate.parse(formattedValue)
, where formattedValue
is the parameter sent in to checkForError
, which we are assuming is just a String
. The LocaDate.parse
method is a Java method that uses parse
with the DateTimeFormatter.ISO_LOCAL_DATE
which isyyyy-MM-dd
ex: 2021-11-15
b. The second try
block sees if there is a format
given in the parameters of the checkForError
method and then uses that to parse the date
if format
is null
, it uses a standard datePattern
, which is defined as "yyyyMMdd"
on line 881
c. The third try
block uses a "variableDateTimePattern
" which is this format: "[yyyyMMddHHmmssZ][yyyyMMddHHmmZ][yyyyMMddHHmmss]"
, and then created a DateTimeFormatter
using that pattern. It then uses the parseBest
method of DateTimeFomatter
, using the offset variableDateTimePattern
and attempts to parse out a date from it.
d. If none of these parse out a date, the method sends back an InvalidDateMessage
, which can be found in ElementErrors.kt
lines 23 - 46.
4.DATETIME
type is in Element.kt
from 428 - 466. It has 5 ways of trying to parse out a date. Each block is a try/catch
block, just like the DATE
example previously.
a. The first try
block does OffsetDateTime.parse(formattedValue)
, which uses the ISO format to parse out a date from a string i.e.: yyyy-MM-dd'T'HH:mm:ssZZZZZ
b. The second try
block sees if there is a format
given in the parameters of the checkForError
method and then uses that to parse the datetime
if the format
is null
, it uses a standard datetimePattern
, which is defined as "[yyyyMMddHHmmssZ][yyyyMMddHHmmZ][yyyyMMddHHmmss]"
on line 883
c. The third try
block tries to parse out a datetime using a LocaDate
pattern in the ReporStream canonical dateFormatter
, which looks like this DateTimeFormatter.ofPattern(datePattern, Locale.ENGLISH)
- this means it is using this date format "yyyyMMdd"
to try to parse out a datetime
d. The fourth try
block tries to parse using the variableDateTimePattern
(akin to its DATE
type counterpart) with the parseBest
method within a timerange
e. The fifth try
block tries to parse using the LocalDate
, instead of the OffsetDateTime
, with a defined format
, or if format
is null
, then the datetimePattern
("[yyyyMMddHHmmssZ][yyyyMMddHHmmZ][yyyyMMddHHmmss]"
)
Right underneath where mapRow
is calling checkForError
, we are calling the toNormalized
method of Element.kt
line 545. This method has almost the exact same code as checkForError
as far as "normalizing" dates go. The difference here is that toNormalized
returns null
in the catch
blocks, otherwise, if the date parses correctly, it returns normalDate.format(dateFormatter)
for DATE
type, and normalDateTime.format(datetimeFormatter)
for DATETIME
type. Also, in the final catch
blocks, it is not using the same InvalidDateMessage
object as checkForError
, instead, it is writing an error: error("Invalid date: '$formattedValue' for element $fieldMapping")
Something to keep in mind: if the Schema definition for an Element defines the Element type as being DATE
and the sender sends a "date" with a "time" on it, it may not be a valid date and the value will be replaced with "**^^validationFail**"
.
@ahay-agile6 Is it your recommendation then that we focus on doing all date validation in the checkForError function that the CsvSerializer uses and abandon use of the mapper that we just built? Also, I would think that we should handle as many date format scenarios as we can think of and handle scenarios where the element is DATE type but user sends a date/time and a DATETIME type where the user sends only a date. I don't think we should throw out or error out on data where there is a legitimate value and all we would have to do is to strip off the time value (for DATE type) or add a midnight time value (for DATETIME type). Thoughts? cc: @MikeC-A6 @rhood23699
@loripusey after speaking with the onboarding team, we came up with these recommendations:
Short Term:
checkForError/toNormalized
overwrite bad dates in the first place.checkForError/toNormalized
methods to handle the DATE
Element type normalization. Add the scenarios of what we know/can predict how senders will manually enter in DATE
s, including with datetimes, and add these senarios to the checkForError/toNormalized
methods.The two recommendations above will solve the data quality issue for dates being check and/or being set to blank when it is a "bad date".
Long Term: The next recommendation is for resolving a few inconsistencies and complexities in the code base that may help bring these types of issues to light
checkForError
exists because a) it is not used in the HL7Serializer
and b) it does not have the same code as toNormalized
, but they are extremely similar (DRY principle)checkForError/toNormalized
in the CSVSerializer
and/or add the checkForError
to the HL7Serializer
if neededhttps://app.zenhub.com/files/304423150/8c349b9a-6d7e-4046-89d9-8aa1d68e8340/download - Has examples of the manual entered date formats that we will need to add, written in Kotlin.
https://app.zenhub.com/files/304423150/93f11950-1c45-45d8-a8c6-f388e1b7924d/download - Example test data file for the Experity/CUC-al-covid-19 schema.
Input the data via the prime
cli
./prime data --input result_files/CUC-Experity-Test2.csv --input-schema Experity/CUC-al-covid-19 --output-dir junk --route
I'll add 3 new tickets for the short & long-term fixes; thanks @ahay-agile6 https://app.zenhub.com/workspace/o/cdcgov/prime-reportstream/issues/3081 https://app.zenhub.com/workspace/o/cdcgov/prime-reportstream/issues/3082 https://app.zenhub.com/workspace/o/cdcgov/prime-reportstream/issues/3083
Problem Statement
We created a custom mapper to handle dates that come from a Sender submission as a string: https://app.zenhub.com/workspaces/experience-607d9d5e68b95200150fec37/issues/cdcgov/prime-reportstream/2246
However, what we didn't know was that the CSV Serializer was doing date validation based on the field being typed as a date (often from the base COVID schema) before the custom mapper would run, thereby, nullifying the effects of the new mapper.
Acceptance Criteria