The size of WSI-VQA datasets

cpystan / WSI-VQA

[ECCV 2024] Official Implementation of 《WSI-VQA: Interpreting Whole Slide Image by Generative Question Answering》

28 stars 1 forks source link

The size of WSI-VQA datasets #2

Closed Lewislou closed 3 months ago

Lewislou commented 3 months ago

The released WSI-VQA dataset only has 7139 VQA pairs. Why the size of the dataset is not the same as the reported in the paper? Btw, I have checked the slide names in TCGA . The overall number of slides of TCGA-BRCA is 1016+2563. And I can only find 541 slides in TCGA-BRCA correspond to the case ids in the WsiVQA.json. Can you provide the slide ids that you used to create the WSI-VQA dataset?

cpystan commented 3 months ago

The WsiVQA.json only contains the training set which contains 7139 pairs. The validation set contains 798 pairs and test set contains 735 pairs. We will upload the validation set and test set soon. In terms of the ids, we use the 'DX' slide as the input. The total number of slides is nearly 1000.

cpystan commented 3 months ago

The ids we use are included in dataset/splits_0.csv

Lewislou commented 3 months ago

Thanks for your quick response, my questions are well solved~

Lewislou commented 2 months ago

Hi, I have noticed in datasets.py. Some WSIs in test set or val set in splits_0.csv are filtered if they have the same patient id with train set? So in testing stage, only 86 WSIs but not 98 WSIs in splits_0.csv are evaluated?

cpystan commented 2 months ago

Yes. You are right.