Some questions about the data processing

ToniChopp / ECAMP

The official implementation of "ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training"

MIT License

35 stars 2 forks source link

Some questions about the data processing #1

Closed Eldo-rado closed 9 months ago

Eldo-rado commented 9 months ago

👋 Hi! Thank you for your contribution, it is really a great job. And I have two questions regarding data processing:

RSNA Pneumonia: In the paper, it is mentioned, "The official data split is followed, with the training/validation/test set consisting of 25,184/1,500/3,000 images, respectively." I checked the RSNA Pneumonia dataset's official website and found that only 25,184+1,500 images have ground truth, and the remaining 3,000 images do not. Where can I find the ground truth for the test set?
SIIM-ACR Pneumothorax: When fine-tuning, is the approach the same as with MRM? Do you still include around 5,000 images without pneumothorax lesions for training, or is the final dataset for training reduced to only around 7,000 images (similar to MGCA), excluding those without lesions?

ToniChopp commented 9 months ago

Hi, thanks for your attention to our work.

1. RSNA Pneumonia: We follow MRM to get the ground truth labels for the test set. The corresponding labels can be accessed here 2. SIIM-ACR Pneumothorax: We do not follow the approach of MRM which utilizes mmsegmentation. Our implement of fine-tune segmentation framework will be released soon. In detail, we only keep positive samples for segmentation (similar to MGCA).

I hope the above info is helpful!

Eldo-rado commented 9 months ago

Thank you for your prompt response! I can obtain the classification labels for the RSNA test set here, but where should I get the corresponding segmentation labels/masks?

ToniChopp commented 9 months ago

There is no official RSNA test set segmentation labels released. Therefore, we follow the split of MGCA, randomly split the original training set into 16, 010/5, 337/5, 337 for training/validation/testing

Eldo-rado commented 9 months ago

Thank you, I have a general understanding.

During RSNA's classification, you followed the division according to MRM, but it seems inappropriate to directly use the results from the MGCA original paper here, as the divisions between MGCA and MRM are inconsistent. (I have tried using the MRM division on MGCA, and the metrics show some improvement 😂 )
Furthermore, may I inquire about the segmentation metrics at RSNA? The results from MedKLIP do not have specific annotations, it appears that they have not been fine-tuned and are directly taken from the original paper's results. However, I could not find corresponding values in the MedKLIP paper. Could you please clarify how you obtained these results? Perhaps it should also be fine-tuned again, as its division method is also unique.

ToniChopp commented 9 months ago

Thank you for your understanding!

You really caught the details. The comparision is not fair enough. I have reconducted the experiement by leveraging Linear classification on MGCA using the data split of MRM on RSNA, where I get the results below: 1% 10% 100% 88.9 90.0 90.8
For the results of MedKLIP segmentation on RSNA, we obtained them directly from Med-Unic. As in segmentation task, we follow the split of MGCA and Med-Unic, I think it is appropriate to directly copy take the results from Med-Unic.

Eldo-rado commented 9 months ago

Got it, thank u ;-)

Eldo-rado commented 9 months ago

Hi, I apologize for the interruption again. I would like to confirm whether, during the reproduction of KAD in Table 2, the DQN module was set to be frozen? Specifically, was only the last linear layer enabled, or was only the backbone(resnet50) frozen while the other parts remained trainable?

ToniChopp commented 9 months ago

We conduct linear classification with only the last linear layer trainable. We solely load the vision backbone parameters of the official released KAD model, then freeze the backbone.