CanDIG / clinical_ETL_code

A library to convert clinical data in csv format into json documents for CanDIG ingestion
GNU Lesser General Public License v3.0
0 stars 0 forks source link

A question about the date format (REDCap data) #77

Closed rjiang9 closed 2 months ago

rjiang9 commented 2 months ago

Hi Marion ,

Another question to bug you, when preparing the data files (splitting the exported REDCap data file into files). The date is required to be interval by the reference_date set in the manifest.yml file. My question is:

Do I need to precess each date field or they are going to be taken care of by ETL_code when running CSVConvert?

Thanks @mshadbolt, Ray

image
mshadbolt commented 2 months ago

Hi Ray,

clinical_ETL_code can convert all the dates to intervals based on the reference date, which should be the earliest date_of_diagnosis for the donor (as you mentioned, this is set in the manifest). So it would convert this earliest date of diagnosis to 0 and calculate all other dates in relation to that, so date_of_birth would be a negative interval that represents the donor's age at first diagnosis.

In this test data, for the date_of_death it shows that it is possible to instead submit an actual integer which represents a day or month interval, based on the donor's date_resolution value. We provided this because some users did not have access to the raw dates and needed to submit the intervals directly.

So it is up to the user if they want clinical_etl to calculate the intervals or they want to submit intervals as integers directly. Compare these two lines in the test mapping csv https://github.com/CanDIG/clinical_ETL_code/blob/c5322991b18ad5e29972615a8a52dccc60cad681/tests/test2mohv2.csv#L13-L14

If you have the raw dates, it would be simplest to use the date_interval() method for all the date fields in your csv mapping template so that clinical_etl calculates the intervals for you.

Hope this helps and let me know if you need any further explanation.

rjiang9 commented 2 months ago

Thank you so much for the detailed explanations, Marion. It is very helpful. I appreciate it.