Health-Informatics-UoN / Carrot-Mapper

Carrot: Convenient And Reusable Rapid Omop Transformer.
https://carrot4omop.ac.uk
MIT License
12 stars 3 forks source link

Don't include dates #611

Open prquinlan opened 5 months ago

prquinlan commented 5 months ago

There is no use case where we need the dates in a scan report? We need to know what column has the dates within it. But if we dont need them, could we just ignore the date values?

spco commented 5 months ago

How might this be done? We don't know which column in a table is the 'Date event' column until the user sets this after upload and processing. And we don't check incoming data for whether any of it's in date format - everything is a string. Automatically checking whether anything parses as a date might be fraught with possibility for false positives.

Other ideas very welcome, as it'd be a nice feature if it can be made to work.

At the least though, perhaps the docs could make it explicitly plain that a Scan Report file will work perfectly fine with the entire contents of any date fields removed?

erummas commented 5 months ago

I think this is at the discretion of the user. When uploading the scan report they need to be aware that IDs and Dates can OR must be deleted. May be a 'read me' section would help the user make these decisions.

spco commented 5 months ago

We came up with lots of ideas in this space. E.g.

  1. use the Data Dictionary to denote the Person ID and Date Event columns for each table.
  2. add the ability to mark fields in the Data Dictionary where the values shouldn't be populated - e.g. dates, person ids, some real-valued fields.
  3. add explicit guidance to the docs to avoid adding dates unless truly required.