Stanford-AIMI / RRG24

Shared task on Large-Scale Radiology Report Generation @ BioNLP ACL'24
https://stanford-aimi.github.io/RRG24/
MIT License
9 stars 0 forks source link

Empty reports in provided train_mimic.json #9

Closed oscarloch closed 4 months ago

oscarloch commented 4 months ago

Hi!,

By looking at the train_mimic.json created using the make-interpret-mimic-cxr.py script, I found that there are a lot of empty reports where the findings and impression sections are empty. However, when looking at the unprocessed reports from the original MIMIC dataset, you can see that there is either findings or impression. Screenshot_1

Is this on purpose? Can we try to replace the missing data? I'm worried that using the current data could teach the models to produce empty reports

Thank you so much!

jbdel commented 4 months ago

Hello,

Some reports in mimic are indeed empty / non-existent. I dont have the exact numbers in mind, but 10K seems a reasonable amount.

JB