A question about txt label visualization error

Jiangdiwen commented 2 months ago

Hi，it's a nice work！！！

I have a question about visualizing txt label codehttps://github.com/JDSobek/MedYOLO/blob/main/utils3D/nifti_utils.py. Why do we need to swap width and height here? If I use this code to visualize my txt label , the mask is offset, like this... But if not swapped, it looks normal...

I am worried whether this difference will affect the correct training of the network.

I would be extremely grateful if you could give me an answer！

JDSobek commented 2 months ago

You might not need to. The way the script is set up is the way that has worked for my tasks. If it generates masks correctly for you without changing anything everything is probably working OK.

Basically when I first made the mask generation script, my masks kept coming out rotated (like your first example). MedYOLO does everything with a depth width height ordering (because YOLOv5 uses width height ordering), so I thought the masks would (to accommodate the NIfTI orientation) need to be width height depth. On top of that, Pytorch Conv3D uses depth height width kernels, which makes for a lot of reorientation going on throughout the framework to accommodate the different requirements and begs a lot of questions about which orientation the imaging is in at any given point in the pipeline. Ultimately the way my masks consistently lined up with the objects was in the height width depth orientation you see in nifti_utils.py.

However, with all the different orientations required, all my imaging being square in the X-Y plane, and the nifti viewers I use either doing automatic reorientation or being ambiguous about whether they reorient imaging, I don't have a good way to figure out whether this is the correct rotation or it's canceling out some other incorrect rotation somewhere else. With that ambiguity I'm unsure whether nifti_utils.py will work universally.

So basically that line is just there in case you notice things aren't lined up correctly but are kind of close to correct (e.g. your first example). While switching that width and height might not fix the problem, I don't understand why I needed to switch them well enough to say it's how it needs to be for everyone's data, so I put a note in case someone has the same problem with the script. nifti_utils.py is mostly meant to be an example of the logic to turn txt labels back into masks, I'm not certain it will work for everyone else's data.

Jiangdiwen commented 2 months ago

This is part of the code I use to generate txt label. What I want to express is whether the shape 0, 1, 2 of the nifti data correspond tox, y, z in the txt label template, as well as width, height, depth？ Am I right to understand this?

So what I am confused about is that in nifti_utils.py, width corresponds to shape[1], and height corresponds to shape[0]. I think this is the reason for the rotation of the drawn mask.

MedYOLO does everything with a depth width height

Does MedYOLO correspond to the meaning of my txt generation code? Thanks for your answer！

JDSobek commented 2 months ago

This is the code I used to generate the bounding boxes for the labels for one of my datasets.

def Make_BBox(mask_array: np.ndarray, label_list: List[int]):
    bbox_list = []

    x_length = mask_array.shape[0]
    y_length = mask_array.shape[1]
    z_length = mask_array.shape[2]

    for mask_value in label_list:
        if np.any(np.isin(mask_array, [mask_value])):
            xslices = np.any(np.isin(mask_array, [mask_value]), axis=(1,2))
            yslices = np.any(np.isin(mask_array, [mask_value]), axis=(0,2))
            zslices = np.any(np.isin(mask_array, [mask_value]), axis=(0,1))

            truexslices = np.nonzero(xslices)
            trueyslices = np.nonzero(yslices)
            truezslices = np.nonzero(zslices)

            min_x = int(truexslices[0].min())
            max_x = int(truexslices[0].max())
            min_y = int(trueyslices[0].min())
            max_y = int(trueyslices[0].max())
            min_z = int(truezslices[0].min())
            max_z = int(truezslices[0].max())

            width = (max_x - min_x)/x_length
            height = (max_y - min_y)/y_length
            depth = (max_z - min_z)/z_length

            x_center = min_x/x_length + width/2.
            y_center = min_y/y_length + height/2.
            z_center = min_z/z_length + depth/2.   

            bbox_list.append([mask_value, z_center, x_center, y_center, depth, width, height])

    return bbox_list

Which sets each direction the same as you have yours set (shape[0]=x, shape[1]=y, shape[2]=z).

Despite this, for some reason when I convert my text labels (and model predictions) to nifti masks, I needed to make width=shape[1] and height=shape[0] in nifti_utils.py. I also do not understand why I needed to do that. I think it has to do with my imaging natively having the patient facing right and not down, and this somehow not being propagated between the NIfTI imaging and the NIfTI mask.

What are you using to view your images/masks? Have you tried viewing the images when you load a NIfTI image, convert a slice to png, and save that png? I know when I do that with my datasets, I typically get a different image than what I see in the NIfTI viewers I use (RILContour/ITKSnap/Mango)

There is a function in the dataset that transposes imaging as part of the dataloading code. If your images ultimately do need to be rotated differently for some reason, you can probably do that my making this function or the dataset aware of how your NIfTIs are oriented. https://github.com/JDSobek/MedYOLO/blob/main/utils3D/datasets.py#L497

Although it's easiest to know whether to do this after seeing results from training a model. If you train a model and the only thing required to make the masks line up is to swap width and height in nifti_utils.py, then the model has probably trained correctly.

Jiangdiwen commented 2 months ago

Oh，I use MITK to see the NifTI. I didn't read them in and then look at the slices, and that's really one thing that needs to be checked. I will examine the output of the training model, as well as the read NifTI directions, to determine exactly what method should be used.

Thanks for your answers !

JDSobek / MedYOLO

A question about txt label visualization error #23