callahantiff / PheKnowVec

Translational Computational Phenotyping
2 stars 0 forks source link

TODO - Please Review AMIA Joint Summits Podium Abstract #116

Closed callahantiff closed 5 years ago

callahantiff commented 5 years ago

Conference: 2020 American Medical Informatics Informatics Summit Co-Athors: @jwyrwa @trinklek @LEHunter @mgkahn @tdbennett Due Date: Thursday, August 15th, 2019 @ 11:59 pm ET

Task: Please review our podium abstract! The two page podium abstract has been written using Overleaf, the draft can be found here. I have also shared the draft with your email addresses.

Notes:

  1. The abstract only covers the results of the automated mapping and does not include the domain expert-reviewed code sets, which is why the list of authors is not longer.
  2. The draft does not yet contain results or a conclusion. I will be adding this information later today once the code set queries I am running are complete. I enabled tracked changes in Overleaf so please feel free to make your edits within the document. I encourage you to place additional feedback, should you have it, in this issue.

Thanks so much for your help! 🙇‍♀ 😄

jwyrwa commented 5 years ago

Wyrwa, not Wrwya :grimacing:

jwyrwa commented 5 years ago

Looks great to me! My only other suggestion is for the second sentence of the second paragraph: "Unfortunately, most of CPs cannot easily be implemented across different EHR systems because they are tailored to specific source vocabularies (SV)."

trinklek commented 5 years ago

@callahantiff Nicely done! I made some minor edits in Overleaf. This is my first time using Overleaf, so hopefully I did it correctly. Sorry if my comments/suggestions are off track - I am definitely a rookie in this domain, so please dismiss if not pertinent. I wasn't able to provide comments in overleaf, so have a few here:

callahantiff commented 5 years ago

Looks great to me! My only other suggestion is for the second sentence of the second paragraph: "Unfortunately, most of CPs cannot easily be implemented across different EHR systems because they are tailored to specific source vocabularies (SV)."

Thank you @jwyrwa! So sorry about the misspelling. I have updated the abstract to reflect that change and this one accordingly. Thank you!

callahantiff commented 5 years ago

@trinklek - Thanks so much for the great and highly detailed feedback! I think you handled Overleaf quite well!

@callahantiff Nicely done! I made some minor edits in Overleaf. This is my first time using Overleaf, so hopefully I did it correctly. Sorry if my comments/suggestions are off track - I am definitely a rookie in this domain, so please dismiss if not pertinent. I wasn't able to provide comments in overleaf, so have a few here:

  • You say "there are many strategies one could employ to align the clinical codes provided in a CP definition to a CDM (e.g. exact string- or manual-mapping, similarity algorithm-derived to name a few)." Is "clinical codes" the correct term?

Yep! Or at least it's the terminology that I am trying to user here. I see a comment from you below suggesting that I re-arrange a few things to make it more obvious and clear what I mean by these terms. Thanks for that suggestion!

  • I generally avoid referring to an author by name in text and just rely on the citation (superscript), but that may be my personal preference. I think you can save quite a bit of space if you make this change.

Good suggestion. Tell made a change where we dropped the date, but left the name. If you are OK with this strategy I will leave it as is (unless we ended up needing more space later on).

  • You say "to provide an unbiased and exhaustive examination..." Do you want to soften the assertion that you will have an "unbiased and exhaustive examination?" Maybe rephrase to "more comprehensive" to acknowledge that you will still have some biases and can't ever truly account for all factors?

Good call, this has been updated.

  • You say "...across all clinical domains, effects the creation of patient cohorts for both case and control groups." You refer to “clinical domains” again in the methods. Are the clinical domains the 9 CPs?

By clinical domains I meant types of clinical data (i.e. conditions, medications, labs, and procedures). I have made sure to add this information.

  • Do you need to spell out OMOP and MIMIC with first use?

I think with this crowd and being that this is a podium abstract, we can get away with OMOP and MIMIC 😄

  • In the methods you define “clinical code sets” and “phenotype definitions.” It would be helpful to define these earlier in the background and then it addresses my previous question “clinical codes” above.

I moved both definitions up to the first place they appear.

  • In a couple places you say 8 phenotypes, but I think it is 9 so changed it – please correct if I am wrong, which I know you will!

You right that we are considering 9 phenotypes, but due to some technicalities, we are not including hypothyroidism in this abstract.

  • Sometimes you use “phenotypes” and sometimes “CP” are you using them interchangeably? If yes, then use the same term for consistency.

Thanks, I'll make sure this gets cleaned up!

callahantiff commented 5 years ago

Good evening co-authors! Sorry for the delay, but the results have now been added to the abstract. Here is what has been added:

Results The 36 different mapping strategies were firstanalyzed using only the clinical codes.Thecreated case patient cohort sizes varied widelyacross the CPs (size of patient cohort createdusing only conditions/size of patient cohort cre-ated using all clinical domains; a single num-ber means the results were the same): ADHD(CHCO: 17639/4472;MIMIC: 131/58), Ap-pendicitis (CHCO: 4178/2948; MIMIC: 30/23), Crohn' s Disease (CHCO: 754/477; MIMIC: 272/189), SCD (CHCO: 333; MIMIC: 13), Sleep Apnea (CHCO: 21631;MIMIC: 2189), SIO (CHCO: 337/108; MIMIC 48/20), and SLE (CHCO: 446/0; MIMIC 178/0). The FP and FN errorrates ranged from 0-88% and 0-25%, respectively. In both cases, the highest error rates were observed in the ADHDCP when using a fuzzy-matching mapping strategy that included all of the concept’s synonyms and descendants. Next,we analyzed the mapping strategies using only the clinical codes and the phenotype logic. Similar patterns were ob-served: ADHD (CHCO: 3624/1706; MIMIC: 17/0), Appendicitis (CHCO: 4178/367; MIMIC: 30/18), Crohn's Disease(CHCO: 230/199; MIMIC: 1), SCD (CHCO: 1351/1308; MIMIC: 54/9), Sleep Apnea (CHCO: 21631; MIMIC: 2189),SIO (CHCO: 168/61; MIMIC 6/0), and SLE (CHCO: 0; MIMIC: 0). The observed FP and FN error rates ranged from0-49% (ADHD) and 0-37% (Appendicitis), respectively. Similar to using only clinical codes, the fuzzy-matchingmapping strategy that included all of the concept’s synonyms and descendants resulted in the highest error rates. In allanalyses, an exact mapping-strategy that included the children of each clinical concept resulted in the lowest error.

Conclusion Our preliminary findings using only clinical codes corroborate prior work by Hripcsak et al. When including clinical codes and phenotype definitions, we found that utilizing automated vocabulary mapping strategies resulted in lowerFP rates, but high FN rates. Work is currently underway which extends these findings by: (1) adding two addition CPs; (2) including new domain expert verified mapping strategies; and (3) performing expert verification of the resulting patient cohorts.

You'll also note that I have only included 7 phenotypes, which is purposeful.

I'm so sorry for taking things to the last minute. The abstract is due by 11:59pm ET, which gives a little over an hour for any additional reviews. If I don't hear anything, I will assume I have your blessing!

Thanks for all of your help with this!

callahantiff commented 5 years ago

Thanks for everyone's help, the abstract has been submitted! 👏 🎉 🏆

For your records, a PDF of the submitted abstract can be found here: AMIA_Informatics_Summits_2020.pdf.

I will be in touch when once I receive a decision!