MyDigiTwinNL / CDF2Medmij-Mapping-tool

Tool for transforming Cohort-study Data (CDF) into FHIR/MedMij compliant resource bundles
Apache License 2.0
1 stars 0 forks source link

Additional date info for PATIENTS in the current SQL DB #10

Closed hcadavid closed 5 months ago

hcadavid commented 10 months ago

@hyunho-mo I'm moving the issue you posted on (https://github.com/MyDigiTwinNL/LifelinesDataAccessDocumentation/issues/1#issue-1924309576) to this repository, where we will actually handle it.

Hi Hector, First of all, thank you very much for the kind explanation regarding the use of the FHIR data. This is a great help in understanding the scheme of how metadata in CSV format have been processed in FHIR format. It is very nice to see that we can extract the information we need from the FHIR resources simply using SQL queries.

BTW, I guess you are already working on this aspect, but just let me discuss what additional information will be needed for the PATIENTS table:

date of inclusion date of death date of last response the date of inclusion will be considered as the time origin for the analysis and the time to event is then defined as the interval between the time origin and the date of the onset of CVD disease. The remaining two will be used for right censoring; for the patients who have been excluded without CVD event, the length of follow-up can be calculated by 'min(date_of_death, date_of_last_response, max_follow_up)'. In particular, in my preliminary work, the 'date of last response' has been calculated by using the time intervals between the last attended questionnaire and the inclusion date (in months) shown in the global summary file. Indeed I think It is necessary to have the above 'date' information to develop risk prediction models using the given standardized data access,

hcadavid commented 10 months ago

@hyunho-mo, thanks for pointing out the need for these variables. After looking at other studies where these kinds of details (related to the research study itself) need to be standardized, I found that the following FHIR resources would be suitable for representing 'date of inclusion' and 'date of last response' ('date of death' was already included in the existing Patient resource):

ResearchSubject

{
                "resourceType": "ResearchSubject",
                "id": "example",
                "text": {
                    "status": "generated",
                    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\"></div>"
                },
                "identifier": {
                    "type": {
                        "text": "Subject id"
                    },
                    "value": "nl-core-patient-1234"
                },
                "period": {
                    "start": "2010-06-10",
                    "end": "2022-06-10"
                },
                "status": "candidate",
                "study": {
                    "reference": "urn:uuid:7d01edee-33e5-5115-a5db-5829e35d5999"
                },
                "individual": {
                    "reference": "urn:uuid:7d01edee-33e5-5115-a5db-5829e35d5e06"
                }
}

ResearchStudy

{
                "resourceType": "ResearchStudy",
                "identifier": [
                    {
                        "type": {
                            "text": "Study name"
                        },
                        "value": "LifelinesNL"
                    }
                ],
                "title": "Lifelines",
                "status": "completed"
}

I think it will be good to have these details not only for performing these 'time-to-event' calculations but also to be able to tell from which study each data point came (on an eventual 'federated data analysis'

With this, 'date of inclusion' and 'date of last response' would be the 'start' and 'end' properties of 'period'. The date of inclusion is already available in the global_summary. @hyunho-mo, what do you think? The 'date of last response' would then be the date of the last non-skipped assessment (in the raw data, the date registered on the last CSV file where the the participant appeared). In the preprocessed data this is evidenced by the empty dates, like in the following where the last response would be 2002-5.

"date": {"1a":"1992-5","1b":"1995-5","1c":"","2a":"2001-5","2b":"2002-5","3a":"","3b":""},

@squareb would this assumption be accurate?

hyunho-mo commented 10 months ago

@hyunho-mo, thanks for pointing out the need for these variables. After looking at other studies where these kinds of details (related to the research study itself) need to be standardized, I found that the following FHIR resources would be suitable for representing 'date of inclusion' and 'date of last response' ('date of death' was already included in the existing Patient resource):

ResearchSubject

{
                "resourceType": "ResearchSubject",
                "id": "example",
                "text": {
                    "status": "generated",
                    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\"></div>"
                },
                "identifier": {
                    "type": {
                        "text": "Subject id"
                    },
                    "value": "nl-core-patient-1234"
                },
                "period": {
                    "start": "2010-06-10",
                    "end": "2022-06-10"
                },
                "status": "candidate",
                "study": {
                    "reference": "urn:uuid:7d01edee-33e5-5115-a5db-5829e35d5999"
                },
                "individual": {
                    "reference": "urn:uuid:7d01edee-33e5-5115-a5db-5829e35d5e06"
                }
}

ResearchStudy

{
                "resourceType": "ResearchStudy",
                "identifier": [
                    {
                        "type": {
                            "text": "Study name"
                        },
                        "value": "LifelinesNL"
                    }
                ],
                "title": "Lifelines",
                "status": "completed"
}

I think it will be good to have these details not only for performing these 'time-to-event' calculations but also to be able to tell from which study each data point came (on an eventual 'federated data analysis'

With this, 'date of inclusion' and 'date of last response' would be the 'start' and 'end' properties of 'period'. The date of inclusion is already available in the global_summary. @hyunho-mo, what do you think? The 'date of last response' would then be the date of the last non-skipped assessment (in the raw data, the date registered on the last CSV file where the the participant appeared). In the preprocessed data this is evidenced by the empty dates, like in the following where the last response would be 2002-5.

"date": {"1a":"1992-5","1b":"1995-5","1c":"","2a":"2001-5","2b":"2002-5","3a":"","3b":""},

@squareb would this assumption be accurate?

Hi Hector, thank you very much for taking care of this. The additional variable called 'period' is exactly what I asked for and the 'end' property corresponds to the 'date of last response' I intended.

squareb commented 10 months ago

@hcadavid That would indeed be an accurate representation