binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
358 stars 88 forks source link

How to deal with multi-label problem? #5

Closed LITTLEKKKK closed 10 months ago

LITTLEKKKK commented 3 years ago

Some cancer may have different parts in one slide because of tumor heterogeneity. Does this code solve the multi-label problem? Or how to deal with multi-label problem by using MIL?

binli123 commented 3 years ago

The code works with multi-class labels. The labels need to be presented as distributed encoded binary vectors. For example, [0, 0, 1], [0, 1, 0], [1, 0, 0] each encodes one of the three classes. The max-pooling branch will pool the instances along with each digit of the class vector, the attentions are computed separately for each class, and the resulted bag representation will have a number of entries equal to the number of classes. This bag representation is then projected by a 1D convolution. Please check the example for TCGA lung cancer dataset.

LITTLEKKKK commented 3 years ago

Thanks for your answer. I still have some questions. There are different types of patches in a slide, and we choose the highest-rank type as the slide-level label. How does the code (as you say [0,0,1], [0,1,0], [1,0,0]) work? I still don't know how it works. Could you explain in detail? Thanks.

binli123 commented 3 years ago

Thanks for your answer. I still have some questions. There are different types of patches in a slide, and we choose the highest-rank type as the slide-level label. How does the code (as you say [0,0,1], [0,1,0], [1,0,0]) work? I still don't know how it works. Could you explain in detail? Thanks.

For an example of three subtypes of cancer, the labels should be prepared as: [1, 0, 0] -- if the slide contains subtype 1 [0, 1, 0] -- if the slide contains subtype 2 [0, 0, 1] -- if the slide contains subtype 3 [1, 1, 0] -- if the slide contains both subtype 1 and subtype 2 ... [0, 0, 0] -- healthy slide

It might still work if the slide is labeled only according to the highest-rank type. For example, subtype 1 is higher-rank than subtype 2 such that a slide contains both subtype 1 and subtype 2 is labeled also as [1, 0, 0] (not [1, 1, 0]).

LITTLEKKKK commented 3 years ago

Thanks a lot. : )