MIC-DKFZ / nnDetection

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.
Apache License 2.0
542 stars 94 forks source link

About multi class dataset #207

Closed kimm51 closed 8 months ago

kimm51 commented 11 months ago

Hello,

I'm curious about how the network is fed in nndetect for multi-class classification. Does it take patches as a stack, or does it support information specific to that region only? Also, can we adapt nndetect for different tasks by changing the loss function? (nnunet used MSE loss for reconstruction, for example.)

I would greatly appreciate your assistance in addressing these questions.

Thank you!

mibaumgartner commented 10 months ago

Dear @kimm51 ,

I'm not quite sure if I understand your question.

Analogues to nnU-Net, the network in nnDetection receives an input patch which is usually a subset of the entire image. No additional information is added to the batch (except the ground truth annotations during training). It is also possible to adapt the loss functions of the network, even though it is a bit more involved. Adding additional auxiliary tasks such as reconstruction could be added easily to Retina U-Net which uses the entire decoder to upsample the feature maps to the entire image resolution. Other auxiliary tasks are also possible but their exact implementation depends on the "level" they interact on: e.g. pixel level tasks can be added to Retina U-Net at the top of the decoder branch, object level auxiliary tasks can be added to the detection head but require additional postprocessing etc.

Best, Michael

kimm51 commented 10 months ago

Hello,

For a single class in nnDetection, the network typically receives an input patch, which is usually a subset of the entire image. During training, ground truth annotations are added to the batch, but no additional information is added. This is the standard approach for single-class object detection. I know this.

Firstly, Does nnDetect support multi-class detection? If yes, is this performed by concatenating patches from different classes and using them as input during training? Because, in general for detection task, This is often done by creating a multi-channel input, with each channel corresponding to a different class. The network is then trained on these multi-class patches.

Thank you Michael!

Best,

mibaumgartner commented 10 months ago

Hi,

nnDetction is able to predict multiple classes within a single patch/image (e.g. tumours might be either benign or malignant) but this is independent from the input patch. The only case (I know of) where object representations are stacked would be instance segmentation, where each object is represented by its own binary mask with a corresponding class but this is performed on the ground truth and not the input patch to the network.

nnDetection also supports multiple "modalities" e.g. MRI which requires multiple input channels. There, the different modalities are stacked within the channel dimension of the input patch and fed through the network.

I am not sure if any of these cases are what you are describing. Would it be possible to provide a reference where the stacking you are referring to is used?

Best, Michael

kimm51 commented 10 months ago

Hello,

I've noticed that in models like YOLO and Faster R-CNN, Multi-class training with the concatenation of patches is commonly employed. To clarify, let's say there are five different pathologies to detect in a volume. Does nnDetect receive all these five different patches from distinct classes, concatenate them, and utilize the merged data as input during training? Created a multi-channel input, with each channel corresponding to a different class. I'm curious to know more about this aspect.

Thanks!

mibaumgartner commented 10 months ago

Dear @kimm51,

in nnDetection the classes (pathologies) do not influence the input to the network. Each input sample(one sample might contain multiple modalities which are independent of the number of classes e.g. CT has 1 channel, MRI might have 2 or more) can contain between 0-N objects where each of them could have a different class. The number of classes thus only influences the output but not the input.

I also haven't seen implementations where detection networks would receive different inputs based on the presence of classes/pathologies in a patch since that would imply that the the presence of a class is already known a priori during inference. I would be curious for pointers in literature/code.

Note: I'm always talking about a single sample. During training batches are used where multiple samples are concatenated based on the batch size, this is also independent of the classes though.

Best, Michael

kimm51 commented 10 months ago

Hello Michael,

I might not have expressed my question clearly. How does the network learn multi-class information?Yes, you emphasized a single input for the network. But, for multi class I am confused. Does it receive a single patch as input, or does it generate patches based on the number of classes and learn from them? I'm specifically trying to understand the approach taken for multi-class (n>2) scenarios. Thank you for your insights.

Best regards.

mibaumgartner commented 10 months ago

Dear @kimm51,

the network receives batches (usually batch size 4) of input patches. These input patches follow a predefined sampling strategy:

Best, Michael

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 8 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.