Chxpert 5x200 dataset - Githubissues

QtacierP / PRIOR

Official repository for the paper "Prototype Representation Joint Learning from Medical Images and Reports, ICCV 2023".

Apache License 2.0

65 stars 6 forks source link

Chxpert 5x200 dataset #10

Open Markin-Wang opened 7 months ago

Markin-Wang commented 7 months ago

Hi, thanks for your work and code.

Could you kindly release the CheXpert 5x200 dataset used in your work. Since the text is randomly selected, it would be difficult to ensure the fair comparison for the future works without knowing the details of this dataset.

Thank you for your kind help.

Best Regards.

QtacierP commented 7 months ago

Thanks for your attention. I have a similar problem for fair comparison, since previous work has not released the CheXpert 5x200 dataset. In order to minimize random variance, I performed five online samplings of the CheXpert 5x200 dataset. The random seed values used for this process were [114514, 114518]. Additionally, I have provided an example on Google Drive, but please note that the 'image_path' in the file is specific to my computer and will need modification accordingly. The current checkpoint may return a slightly different performance compared to the results reported in the paper, since we have re-trained it for open-source validation.

QtacierP commented 7 months ago

For the retrieval task, I am sorry that I cannot share the specific report with you, due to the strict MIMICCXR License. However, based on our observations, we have not identified any significant performance differences among different reports sampled from MIMICCXR within a given class. I hope this finding could help you.

Markin-Wang commented 7 months ago

For the retrieval task, I am sorry that I cannot share the specific report with you, due to the strict MIMICCXR License. However, based on our observations, we have not identified any significant performance differences among different reports sampled from MIMICCXR within a given class. I hope this finding could help you.

Thank you so much for your reply and finding. Yes, I have obtained the MIMIC-CXR licence and fully understand the restriction to share the full report. I wonder if it is possible to share a file with id only (similar to the one you provided in the Google Drive), e.g., chexpert image id and the study id/subject id of selected reports. In this way, there is no need to share the full report.

I am grateful for your kind support.

Best Regards

QtacierP commented 7 months ago

Thank you for your understanding. Given that I have only stored the entire report during the preprocessing stage (and online), it might be challenging for me to determine the patient index. However, I will attempt to match them to find their original IDs. Alternatively, you can also try to make the sampling process sklearn.utils.shuffle(all_ids) by yourself. By doing so, you can obtain the first 200 reports for each class. (The performance gap is minimal)

QtacierP commented 7 months ago

I found that I have uploaded the CheXPert5x200 dataset on Github. I will upload a similar retrieval dataset once I find the ID.

Markin-Wang commented 7 months ago

I found that I have uploaded the CheXPert5x200 dataset on Github. I will upload a similar retrieval dataset once I find the ID.

Thank you so much for your help. It is much helpful for my future research!

Markin-Wang commented 7 months ago

Thank you for your understanding. Given that I have only stored the entire report during the preprocessing stage (and online), it might be challenging for me to determine the patient index. However, I will attempt to match them to find their original IDs. Alternatively, you can also try to make the sampling process sklearn.utils.shuffle(all_ids) by yourself. By doing so, you can obtain the first 200 reports for each class. (The performance gap is minimal)

Hi, may I ask which dataset split the reports of MIMIC-CXR come from? train or test? Also, could you kindly release the code for the image-to-text retrieval? Thank you for your kind help.

QtacierP commented 7 months ago

We sampled the reports from the training set in MIMIC-CXR.

Markin-Wang commented 7 months ago

We sampled the reports from the training set in MIMIC-CXR.

Thank you for your reply.