split question - Githubissues

DearCaat / MHIM-MIL

[ICCV 2023 Oral] Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification

63 stars 3 forks source link

split question #9

Closed lingxitong closed 8 months ago

lingxitong commented 8 months ago

hello，nice work,I have a question,for tcga dataset,slide and case are not one2one,so the 65:10:25 is split according to the slide or the case?wheather we need ensure the slides from same case not appear in different dataset(train val test)?

DearCaat commented 8 months ago

thanks for attention. Split of TCGA is according to the case, following TransMIL and DTFD-MIL. the label.csv of TCGA is the case id not the slide id, so i can ensure the slides from same case not appear in different dataset. code of split dataset, codes of load slide features

lingxitong commented 8 months ago

thanks for your quick reply!

akidway commented 5 months ago

thanks for attention. Split of TCGA is according to the case, following TransMIL and DTFD-MIL. the label.csv of TCGA is the case id not the slide id, so i can ensure the slides from same case not appear in different dataset. code of split dataset, codes of load slide features

Hi, @DearCaat The split ratio is 65:10:25 at the case level. Is this the same ratio used at the file level, or is the file level ratio not taken into consideration?

DearCaat commented 5 months ago

Hi, @akidway The file ratio is not taken into consideration. Some cases have only one file, but some have more.

akidway commented 5 months ago

Got it. Thank you for quick reply!