Open Markin-Wang opened 7 months ago
Thanks for your attention. I have a similar problem for fair comparison, since previous work has not released the CheXpert 5x200 dataset. In order to minimize random variance, I performed five online samplings of the CheXpert 5x200 dataset. The random seed values used for this process were [114514, 114518]. Additionally, I have provided an example on Google Drive, but please note that the 'image_path' in the file is specific to my computer and will need modification accordingly. The current checkpoint may return a slightly different performance compared to the results reported in the paper, since we have re-trained it for open-source validation.
For the retrieval task, I am sorry that I cannot share the specific report with you, due to the strict MIMICCXR License. However, based on our observations, we have not identified any significant performance differences among different reports sampled from MIMICCXR within a given class. I hope this finding could help you.
For the retrieval task, I am sorry that I cannot share the specific report with you, due to the strict MIMICCXR License. However, based on our observations, we have not identified any significant performance differences among different reports sampled from MIMICCXR within a given class. I hope this finding could help you.
Thank you so much for your reply and finding. Yes, I have obtained the MIMIC-CXR licence and fully understand the restriction to share the full report. I wonder if it is possible to share a file with id only (similar to the one you provided in the Google Drive), e.g., chexpert image id and the study id/subject id of selected reports. In this way, there is no need to share the full report.
I am grateful for your kind support.
Best Regards
Thank you for your understanding. Given that I have only stored the entire report during the preprocessing stage (and online), it might be challenging for me to determine the patient index. However, I will attempt to match them to find their original IDs. Alternatively, you can also try to make the sampling process sklearn.utils.shuffle(all_ids) by yourself. By doing so, you can obtain the first 200 reports for each class. (The performance gap is minimal)
I found that I have uploaded the CheXPert5x200 dataset on Github. I will upload a similar retrieval dataset once I find the ID.
I found that I have uploaded the CheXPert5x200 dataset on Github. I will upload a similar retrieval dataset once I find the ID.
Thank you so much for your help. It is much helpful for my future research!
Thank you for your understanding. Given that I have only stored the entire report during the preprocessing stage (and online), it might be challenging for me to determine the patient index. However, I will attempt to match them to find their original IDs. Alternatively, you can also try to make the sampling process sklearn.utils.shuffle(all_ids) by yourself. By doing so, you can obtain the first 200 reports for each class. (The performance gap is minimal)
Hi, may I ask which dataset split the reports of MIMIC-CXR come from? train or test? Also, could you kindly release the code for the image-to-text retrieval? Thank you for your kind help.
We sampled the reports from the training set in MIMIC-CXR.
We sampled the reports from the training set in MIMIC-CXR.
Thank you for your reply.
Hi, thanks for your work and code.
Could you kindly release the CheXpert 5x200 dataset used in your work. Since the text is randomly selected, it would be difficult to ensure the fair comparison for the future works without knowing the details of this dataset.
Thank you for your kind help.
Best Regards.