JoakimEdin / explainable-medical-coding

MIT License
7 stars 1 forks source link

Data split change from your previous study #2

Open abhishekdhankar95 opened 1 week ago

abhishekdhankar95 commented 1 week ago

Hi Joakim,

Your previous paper, "Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study" (2023), you used a split that stratified according to ICD codes. You also pointed out some deficiencies of another split more popular in the community, which was introduced in Mullenbach et al.'s "Explainable Predication of Medical Codes from Clinical Text" (2018). Mullenbach's (2018) split was not stratified according to ICD-codes and so presented some problems in comparing the performance of difference models.

The Questions:

  1. Why did you use Mullenbach's (2018) split for this study? Would you say that your criticism of Mullenbach's split still stands?
  2. In your view, should researchers use your split in the 2023 paper if they are merely concerned about creating models that can predict accurate ICD codes, and are not interested in creating an explainable model (yet)?

Thanks !

JoakimEdin commented 6 days ago

Hi @abhishekdhankar95, thank you for your questions!

  1. In this study, we focused on explainability. We used MDACE to evaluate the explanations because it is annotated with evidence spans. MDACE comprises 302 reannotated examples from the MIMIC-III full test set. To avoid having any of these examples in our training set, we decided to train our models on MIMIC-III full.

  2. The advantage of stratified sampling is that we ensure that the frequency of each code is similar in the training, evaluation, and test set. In our previous study, we discovered that most codes in the training set never occurred in the test set and that many codes in the test set never occurred in the training set. If you simply ignore the codes that do not occur in the test set during your evaluation, this problem becomes negligible. If you do that, I think that stratified sampling becomes less important. That being said, I think using a stratified sampled dataset is better, so if it is available, I would prefer it.

Makes sense?