Closed jaysheth09 closed 1 year ago
Here's a minimal, reproducible example to better understand.
Before the code snippet outlined above is executed (assuming only 10 patches sampled), the multimodal data patch size: (10,2,16,16,16)
and the labels (10,16,16,16)
are both in the volumetric patch space (3D: 16,16,16)
. But for the purposes of training, we need to convert the 3D patch segmentation to a point label, where every patch is labelled either 1 (lesion) or 0 (not a lesion). Essentially, this amounts to assigning the label of the approximately central point (8,8,8) from the 3D patch segmentation to the point classification label.
You could possibly take a median of the whole patch and decide which is which, but central point seemed to work well for our use case.
import numpy as np
>>> X = np.random.uniform(0, 100, size=(10,2,16,16,16))
>>> X.shape
(10, 2, 16, 16, 16) # 3D multimodal patch
>>> Y = np.random.randint(2, size=((10,16,16,16)))
>>> Y.shape
(10, 16, 16, 16) # 3D segmentation label
>>> Y = Y[:, Y.shape[1] // 2, Y.shape[2] // 2, Y.shape[3] // 2]
>>> Y = np.squeeze(Y)
>>> Y.shape
(10,) # point classification label
@ravnoor, Thanks for your support and detailed explanation.
Description: In the patch_dataloader.py file's load_training_data() method, you have performed a squeezing operation on the 'Y' array using the following code:
Query: What is the purpose and necessity of this operation? And why have you selected 'Y.shape[1] // 2, Y.shape[2] // 2, Y.shape[3] // 2' values from the patch having size (16,16,16)? Your insights would be greatly appreciated.