Open a-chahin opened 1 year ago
Thanks @a-chahin! Same comments as: https://github.com/MIT-LCP/mimic-code/pull/1418 and https://github.com/MIT-LCP/mimic-code/pull/1419#issuecomment-1307325463. Also tagging @matentzn in case he would like to add his thoughts.
Thanks again @matentzn
@danamouk it looks like your new pull request at https://github.com/MIT-LCP/mimic-code/pull/1660 is duplicating a lot of the work in this pull request. I think it would be best to:
If Northwestern aren't able to provide their mapping in SSSOM format, then we should, if possible, convert to SSSOM format for them. If there are columns that are difficult to populate, then we can leave them empty and complete them later.
Thanks @matentzn for the feedback! I have updated the mapping justification to semapv:ManualMappingCuration
and updated the CSV to TSV file. @tompollard we can merge the changes if everything looks good!
This looks great!
For the future, you may want to consider to set up a proper mapping registry (https://mapping-commons.github.io/sssom/mapping-commons/). This allows you to get features like mapping set validation (schema validation, and more) as part of your pull requests.
@matentzn, please could you say more about the following comment?
CSV is not a well defined format, I would suggest TSV.
We've always used CSV as our default format and it is fairly well defined as part of RFC 4180.
@danamouk, TSV is inconsistent with (1) the other data files in this repository and (2) the data files that you will be releasing as part of the Northwestern project.
It's not a big deal, but we have always pushed for TSV from everything, because, among others, of the need to escape the (frequent) commas in entity labels.. you can scroll through here for details: https://mapping-commons.github.io/sssom/spec/#serialisation
It also has the advantage that standard tools (esp. validators) can process the files more easily, but I guess this is debatable.
It's not a big deal, but we have always pushed for TSV from everything, because, among others, of the need to escape the (frequent) commas in entity labels
Following the RFC spec by quoting strings (or at least strings with commas) solves the issue of commas in labels. "
characters would need escaping, but I'm guessing these don't appear especially frequently in the labels?
Tabs appear pretty frequently in the strings that we work with, and the benefit of commas over tabs is that they are easier to see!
Sure,sounds good!
If you get make an issue at https://github.com/mapping-commons/sssom/issues explaining why you prefer to, and will, push for CSv, that will be helpful in prioritising respective tool support! Should not be an issue.
Sorry about the suggestion wit TSV!
If you get make an issue at https://github.com/mapping-commons/sssom/issues explaining why you prefer to, and will, push for CSV, that will be helpful in prioritising respective tool support
Thanks, I'm glad you raised this point and it is helpful to talk this through. It's not so much that I think one is better than the other, more that we made a decision to use CSV in the past and I'm trying to understand whether there are good reasons to switch.
Thanks @tompollard & @matentzn for your suggestions on our initial vital signs mapping! This will be really helpful as we map the rest of the mappings to SSSOM format. I have retained the csv file format and updated the mapping justification to semapv:ManualMappingCuration
.
@tompollard if we are happy with the changes, feel free to merge this pull request #1417 , and close #1660 and we can update any issues or changes that come up in another pull request!
This pull request adds two mapping csv files for vital signs concepts from the
d_items
andchartevents
tables in MIMIC-IV v2.0. The first filechartevents_to_loinc.csv
containsitemid
toLOINC
mappings. The second filechartevents_to_omop.csv
containsitemid
toOMOP
mappings.Both files use the simple standard for sharing ontology mappings
SSSOM
format @sssom