Closed Lewislou closed 3 months ago
The WsiVQA.json only contains the training set which contains 7139 pairs. The validation set contains 798 pairs and test set contains 735 pairs. We will upload the validation set and test set soon. In terms of the ids, we use the 'DX' slide as the input. The total number of slides is nearly 1000.
The ids we use are included in dataset/splits_0.csv
Thanks for your quick response, my questions are well solved~
Hi, I have noticed in datasets.py. Some WSIs in test set or val set in splits_0.csv are filtered if they have the same patient id with train set? So in testing stage, only 86 WSIs but not 98 WSIs in splits_0.csv are evaluated?
Yes. You are right.
The released WSI-VQA dataset only has 7139 VQA pairs. Why the size of the dataset is not the same as the reported in the paper? Btw, I have checked the slide names in TCGA . The overall number of slides of TCGA-BRCA is 1016+2563. And I can only find 541 slides in TCGA-BRCA correspond to the case ids in the WsiVQA.json. Can you provide the slide ids that you used to create the WSI-VQA dataset?