MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.59k stars 1.52k forks source link

Add vital signs mappings for MIMIC-IV v2.0 - SSSOM format #1417

Open a-chahin opened 1 year ago

a-chahin commented 1 year ago

This pull request adds two mapping csv files for vital signs concepts from the d_items and chartevents tables in MIMIC-IV v2.0. The first file chartevents_to_loinc.csv contains itemid to LOINC mappings. The second file chartevents_to_omop.csv contains itemid to OMOP mappings.

Both files use the simple standard for sharing ontology mappings SSSOM format @sssom

tompollard commented 1 year ago

Thanks @a-chahin! Same comments as: https://github.com/MIT-LCP/mimic-code/pull/1418 and https://github.com/MIT-LCP/mimic-code/pull/1419#issuecomment-1307325463. Also tagging @matentzn in case he would like to add his thoughts.

tompollard commented 1 year ago

Thanks again @matentzn

tompollard commented 1 year ago

@danamouk it looks like your new pull request at https://github.com/MIT-LCP/mimic-code/pull/1660 is duplicating a lot of the work in this pull request. I think it would be best to:

If Northwestern aren't able to provide their mapping in SSSOM format, then we should, if possible, convert to SSSOM format for them. If there are columns that are difficult to populate, then we can leave them empty and complete them later.

danamouk commented 12 months ago

Thanks @matentzn for the feedback! I have updated the mapping justification to semapv:ManualMappingCuration and updated the CSV to TSV file. @tompollard we can merge the changes if everything looks good!

matentzn commented 12 months ago

This looks great!

For the future, you may want to consider to set up a proper mapping registry (https://mapping-commons.github.io/sssom/mapping-commons/). This allows you to get features like mapping set validation (schema validation, and more) as part of your pull requests.

tompollard commented 12 months ago

@matentzn, please could you say more about the following comment?

CSV is not a well defined format, I would suggest TSV.

We've always used CSV as our default format and it is fairly well defined as part of RFC 4180.

@danamouk, TSV is inconsistent with (1) the other data files in this repository and (2) the data files that you will be releasing as part of the Northwestern project.

matentzn commented 12 months ago

It's not a big deal, but we have always pushed for TSV from everything, because, among others, of the need to escape the (frequent) commas in entity labels.. you can scroll through here for details: https://mapping-commons.github.io/sssom/spec/#serialisation

It also has the advantage that standard tools (esp. validators) can process the files more easily, but I guess this is debatable.

tompollard commented 12 months ago

It's not a big deal, but we have always pushed for TSV from everything, because, among others, of the need to escape the (frequent) commas in entity labels

Following the RFC spec by quoting strings (or at least strings with commas) solves the issue of commas in labels. " characters would need escaping, but I'm guessing these don't appear especially frequently in the labels?

Tabs appear pretty frequently in the strings that we work with, and the benefit of commas over tabs is that they are easier to see!

matentzn commented 12 months ago

Sure,sounds good!

If you get make an issue at https://github.com/mapping-commons/sssom/issues explaining why you prefer to, and will, push for CSv, that will be helpful in prioritising respective tool support! Should not be an issue.

Sorry about the suggestion wit TSV!

tompollard commented 12 months ago

If you get make an issue at https://github.com/mapping-commons/sssom/issues explaining why you prefer to, and will, push for CSV, that will be helpful in prioritising respective tool support

Thanks, I'm glad you raised this point and it is helpful to talk this through. It's not so much that I think one is better than the other, more that we made a decision to use CSV in the past and I'm trying to understand whether there are good reasons to switch.

danamouk commented 11 months ago

Thanks @tompollard & @matentzn for your suggestions on our initial vital signs mapping! This will be really helpful as we map the rest of the mappings to SSSOM format. I have retained the csv file format and updated the mapping justification to semapv:ManualMappingCuration.

@tompollard if we are happy with the changes, feel free to merge this pull request #1417 , and close #1660 and we can update any issues or changes that come up in another pull request!