Closed shenxiaochenn closed 7 months ago
Thanks for attention!
This should be a misunderstanding. In the common tasks of computational pathology, including the cancer diagnosis, sub-typing, and prognosis prediction that the paper deals with, the batch size is constant at 1. This is mainly because each sample (an input sample in the prognosis task contains multiple WSIs) has a variable number of instances.
Some of the code and comments retain the batch size dimension for the sake of uniformity, but in fact this dimension does not exist and is constant at 1.
I hope this answer can solve your confusion. Best wish!
Thanks for attention!
This should be a misunderstanding. In the common tasks of computational pathology, including the cancer diagnosis, sub-typing, and prognosis prediction that the paper deals with, the batch size is constant at 1. This is mainly because each sample (an input sample in the prognosis task contains multiple WSIs) has a variable number of instances.
Some of the code and comments retain the batch size dimension for the sake of uniformity, but in fact this dimension does not exist and is constant at 1.
I hope this answer can solve your confusion. Best wish!
Thanks for your reply! However, I still have some confusion. Firstly, I was confused about $dispatch_weights$ and $combine_weights$. Can you further help me break down what they do? I may not have found it in the paper. And, I found some differences between the pseudo-code in the paper and the actual code in this repository. This confuses me. :no_mouth: :no_mouth: :no_mouth:
dispatch_weights
and combine_weights
, the main idea of them comes from MoE. U can refer it to get more information. In my paper, combine_weights
aims to aggregate all instances in the region to get representative features of the region. And dispatch_weights
is to assign the modeled representative features of the region to each instance. The two dispatch_weights
are feature distributions in two dimensions, respectively.
Good job! However, I`m confused about CR-MSA. In the rmsa.py file, the $attn_regions$, which was as the input to $self.attn = InnerAttention()$ layer, the shape was $(sW,nW*B,C)$. So, this will result in attention scores being calculated for all regions within a batch. This may not be reasonable, as regions between different WSIs should not calculate attention scores.