Open abhishekdhankar95 opened 1 week ago
Hi @abhishekdhankar95, thank you for your questions!
In this study, we focused on explainability. We used MDACE to evaluate the explanations because it is annotated with evidence spans. MDACE comprises 302 reannotated examples from the MIMIC-III full test set. To avoid having any of these examples in our training set, we decided to train our models on MIMIC-III full.
The advantage of stratified sampling is that we ensure that the frequency of each code is similar in the training, evaluation, and test set. In our previous study, we discovered that most codes in the training set never occurred in the test set and that many codes in the test set never occurred in the training set. If you simply ignore the codes that do not occur in the test set during your evaluation, this problem becomes negligible. If you do that, I think that stratified sampling becomes less important. That being said, I think using a stratified sampled dataset is better, so if it is available, I would prefer it.
Makes sense?
Hi Joakim,
Your previous paper, "Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study" (2023), you used a split that stratified according to ICD codes. You also pointed out some deficiencies of another split more popular in the community, which was introduced in Mullenbach et al.'s "Explainable Predication of Medical Codes from Clinical Text" (2018). Mullenbach's (2018) split was not stratified according to ICD-codes and so presented some problems in comparing the performance of difference models.
The Questions:
Thanks !