JDSobek / MedYOLO

A 3D bounding box detection model for medical data.
GNU Affero General Public License v3.0
35 stars 9 forks source link

Result error - Data format determined #12

Closed xiongjiuli closed 5 months ago

xiongjiuli commented 6 months ago

sorry first. This is very disturbing. After I run through the program, the result has been wrong. I have never run Anchor-based detection methods including yolo series before my run :

python train.py --data example.yaml --adam --norm CT --epochs 300 --patience 870 --device 0 \
                --weights /public_bme/data/xiongjl/MedYOLO/runs/train/exp26/weights/last.pt \
                --workers 2

my data format is :

and the .txt file is below, i only have one class, i start with 0, and my object is relatively small.

0 0.6804977041015624 0.5620704590657366 0.5184990356691964 0.03726457456081081 0.025558185797142856 0.02
0 0.6623090284258869 0.5626880352657366 0.5034778011549107 0.03971397547297297 0.023087881 0.02
0 0.5772347480204818 0.5669581848085937 0.6031484416977678 0.012808950564189188 0.014285714285714285 0.014285714285714285
0 0.6374013155880489 0.5738044068657365 0.5321950944320536 0.037523249087837836 0.0298812192 0.02
0 0.5020538324799412 0.5756929150085937 0.5794631672491964 0.01911052843918919 0.028955223885714285 0.02
0 0.5690466162637243 0.575866727894308 0.6135705640977679 0.01787556769594595 0.011955578791428572 0.012583750322857143
0 0.597423467615076 0.5820628943514509 0.5481984916663393 0.044175385202702706 0.032475436085714286 0.03422389485714286
0 0.5399910824799408 0.6006216033800224 0.5811777635634822 0.030588255902027028 0.03576169725714286 0.03974916374285714

but my output is:

autoanchor: Analyzing anchors... anchors/target = 0.00, Best Possible Recall (BPR) = 0.0003. Attempting to improve anchors, please wait...
niftianchors: WARNING: Extremely small objects found. 2 of 22255 labels are < 4.0 voxels in size.
niftianchors: Running kmeans for 18 anchors on 22253 points...
niftianchors: thr=0.25: 0.9999 best possible recall, 15.48 anchors past thr
niftianchors: n=18, img_size=350, metric_all=0.481/0.829-mean/best, past_thr=0.531-mean: 4,4,4,  5,5,5,  7,6,6,  6,8,6,  8,7,7,  8,8,10,  8,11,7,  11,8,8,  11,11,9,  11,10,13,  10,17,9,  16,11,11,  13,17,14,  14,14,20,  23,16,17,  15,27,16,  26,32,28,  56,55,48
niftianchors: thr=0.25: 0.9999 best possible recall, 15.52 anchors past thr
niftianchors: n=18, img_size=350, metric_all=0.488/0.831-mean/best, past_thr=0.538-mean: 4,4,4,  5,5,5,  7,6,6,  6,8,6,  8,7,7,  8,8,9,  8,11,7,  11,8,8,  10,11,9,  11,10,12,  10,17,10,  16,11,11,  13,16,13,  14,13,18,  23,16,17,  15,26,16,  26,32,28,  54,56,46
autoanchor: New anchors saved to model. Update model *.yaml to use these anchors in the future.

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100       2599   0.000547   0.000385   2.22e-05   2.22e-06

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size
                 all        100          0          0          0          0          0

     Epoch   gpu_mem       box       obj       cls    labels  img_size

then i check the loss: in the loss part , in the bulid target function , i print the @@@ bulid targets @@@ the target should be (image,class,z,x,y,d,w,h) - tensor([0.00000, 0.00000, 0.10811, 0.47003, 0.48593, 0.03041, 0.02571, 0.02571], device='cuda:0') in the

    def build_targets(self, pred, targets):
        with open('/public_bme/data/xiongjl/MedYOLO/log/log.txt', 'a') as file:
            file.write(f'@@@ bulid targets @@@ the target should be (image,class,z,x,y,d,w,h) - {targets[0]}\n')
        """Build targets for compute_loss()
        input targets with format (image,class,z,x,y,d,w,h)    

        Args:
            pred (torch.Tensor): Example prediction, shape used to set gain
            targets (torch.Tensor): Normalized targets to scale to prediction size

        Returns:
            tcls (List[int]): classes corresponding to each target
            tbox (List[torch.Tensor]): bounding boxes corresponding to each target
            indices (List[Tuple[float]]): image, anchor, and grid indices for each detection layer
            anch (List[int]): List of anchors corresponding to each detection layer
        """

why the image value is 0

In addition, the subsequent pbox will also have very strange values. I am afraid that my label format is wrong or not, and I have not changed anything else

        # Losses
        for i, pi in enumerate(p):  # layer index, layer predictions
            b, a, gi, gk, gj = indices[i]  # image, anchor, gridz, gridy, gridx
            with open('/public_bme/data/xiongjl/MedYOLO/log/log.txt', 'a') as file:
                file.write(f'image, anchor, gridz, gridy, gridx is {indices[i]}\n')
            tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

            n = b.shape[0]  # number of targets
            with open('/public_bme/data/xiongjl/MedYOLO/log/log.txt', 'a') as file:
                file.write(f'### the number of target ### {n}\n')
            if n:
                ps = pi[b, a, gi, gk, gj]  # prediction subset corresponding to targets

                # Regression
                pbox = box_regression(ps, anchors[i])  # predicted box
                with open('/public_bme/data/xiongjl/MedYOLO/log/log.txt', 'a') as file:
                    file.write(f'@@@@ pred box @@@@ : pbox is {pbox}\n')
                    file.write(f'&&& iou &&& pbox is {pbox.T}\n the tbox is {tbox[i].T}\n')
                iou = bbox_iou(pbox.T, tbox[i], z1x1y1z2x2y2=False) #, CIoU=True)  # iou(prediction, target)
                lbox += (1.0 - iou).mean()  # iou loss

the output is :


image, anchor, gridz, gridy, gridx is (tensor([0, 0, 0,  ..., 7, 7, 7], device='cuda:0'), tensor([0, 0, 0,  ..., 5, 5, 5], device='cuda:0'), tensor([ 4, 13, 13,  ..., 33, 36, 43], device='cuda:0'), tensor([21, 14, 15,  ..., 24, 26, 26], device='cuda:0'), tensor([20, 14, 17,  ..., 18, 18, 23], device='cuda:0'))
### the number of target ### 5548
@@@@ pred box @@@@ : pbox is tensor([[ 0.62988,  0.91699,  0.26953,  1.54366,  1.48429,  1.33087],
        [-0.14502,  0.42041,  0.74219,  1.36800,  1.38059,  1.17655],
        [ 0.48145,  0.95215,  0.43604,  1.06346,  1.15011,  0.97057],
        ...,
        [ 0.21875,  0.30908, -0.29175,  1.34358,  1.09524,  0.76166],
        [ 0.29688,  0.19531, -0.28418,  1.19297,  0.96478,  0.65771],
        [ 0.14502,  0.21094, -0.15601,  1.32380,  1.10110,  0.78635]], device='cuda:0', grad_fn=<CatBackward0>)
&&& iou &&& pbox is tensor([[ 0.62988, -0.14502,  0.48145,  ...,  0.21875,  0.29688,  0.14502],
        [ 0.91699,  0.42041,  0.95215,  ...,  0.30908,  0.19531,  0.21094],
        [ 0.26953,  0.74219,  0.43604,  ..., -0.29175, -0.28418, -0.15601],
        [ 1.54366,  1.36800,  1.06346,  ...,  1.34358,  1.19297,  1.32380],
        [ 1.48429,  1.38059,  1.15011,  ...,  1.09524,  0.96478,  1.10110],
        [ 1.33087,  1.17655,  0.97057,  ...,  0.76166,  0.65771,  0.78635]], device='cuda:0', grad_fn=<PermuteBackward0>)
 the tbox is tensor([[ 0.75676,  0.24894,  0.32599,  ...,  0.25907,  0.27456,  0.17241],
        [ 0.68112,  0.27186,  0.87866,  ...,  0.11922,  0.06609,  0.11370],
        [ 0.38106,  0.84903,  0.17525,  ..., -0.20863, -0.23744, -0.13991],
        [ 1.33784,  1.35610,  1.02143,  ...,  0.82759,  0.78145,  0.68966],
        [ 1.13143,  1.56620,  1.30424,  ...,  0.75428,  0.90087,  0.62857],
        [ 1.13143,  1.71985,  1.83988,  ...,  0.75429,  0.75878,  0.62857]], device='cuda:0')
JDSobek commented 6 months ago

Without a good set of pretrained weights, it can take a long time for the model to start figuring out enough about the objects of interest to start fitting them. Since the model is not really trained, your predictions will be essentially random. The values in the predictions you shared look normal, even though they clearly aren't accurate. Programmatically it can predict objects that are only partially inside the image, though I don't have a dataset to test how well it does that.

I think the labels look correctly formatted. This model doesn't tend to optimize well for very small objects (e.g. I haven't been able to get it to optimize on LIDC), which might be your problem. A sliding-window/patch-based architecture like nnDetection, or one of the lung nodule models, might work better for you if you need to detect very small objects. In my experience, MedYOLO works better for localizing whole organs and larger regions that would span several patches/windows.

image in (image,class,z,x,y,d,w,h) is the batch index, so you're seeing 0 because it's the first image in the batch. It's not the imaging data, it's just tracking which predictions belong to which training example.

If I'm reading this right you have ~22,000 labeled training examples, so you should have a good chance for the model to figure things out, but if your anchors are bad, re: the other issue, there might still be issues.

xiongjiuli commented 6 months ago

Thank you very much for your answer. As for the question about small objects, you mentioned sliding-window/patch-based. I was wondering if I could first cut the image into 100, 100 and 100 patches as my input image, so that during the training process, It will resize to 350,350,350, and then it will be relatively large. On the overall image, have you tried something similar with the previous nodule dataset

JDSobek commented 6 months ago

If GPU memory allows you can resize your incoming data to a size larger than 350x350x350 (e.g. 512x512x512), although I've never gotten good results doing that on datasets that weren't working at 350.

I think the problem comes from the downsampling making small objects disappear before they get deep into the network, so maybe breaking the images up into much smaller patches will provide enough label volume after resizing for the objects of interest to survive deep enough into the network to make useful predictions. I haven't tried it though. I've seen the model work really well on objects that are common in the dataset (e.g. hearts, livers), and not so well on objects that are rare... though my datasets have been fairly small (<~1000).

You'll need to write some code to consolidate the predictions back into single images if you try the patch approach. There might be some other concerns I'm unaware of too. If it works that would be interesting, though whether it's worth your effort instead of using a model that will natively do this is a question.