JDSobek / MedYOLO

A 3D bounding box detection model for medical data.
GNU Affero General Public License v3.0
35 stars 9 forks source link

Dimension error in # Recompute the anchors if the metric is too low #11

Closed xiongjiuli closed 5 months ago

xiongjiuli commented 6 months ago

when i recompute the anchors , i encounter aan error:

autoanchor: ERROR: DataLoader worker (pid 32319) is killed by signal: Killed. 
Traceback (most recent call last):
  File "/public_bme/data/xiongjl/MedYOLO/train.py", line 521, in <module>
    main(opt)
  File "/public_bme/data/xiongjl/MedYOLO/train.py", line 428, in main
    train(opt.hyp, opt, device, callbacks)
  File "/public_bme/data/xiongjl/MedYOLO/train.py", line 183, in train
    nifti_check_anchors(train_dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)
  File "/public_bme/data/xiongjl/MedYOLO/utils3D/anchors.py", line 62, in nifti_check_anchors
    new_bpr = metric(anchors)[0]
              ^^^^^^^^^^^^^^^
  File "/public_bme/data/xiongjl/MedYOLO/utils3D/anchors.py", line 43, in metric
    r = dwh[:, None] / k[None]
        ~~~~~~~~~~~~~^~~~~~~~~
RuntimeError: The size of tensor a (22255) must match the size of tensor b (3) at non-singleton dimension 1
terminate called without an active exception
Aborted

then i find in the function

    def metric(k):  # compute metrics for anchors
        r = dwh[:, None] / k[None]
        # print(f'r shape is {r.shape}')
        x = torch.min(r, 1./r).min(2)[0] # ratio metric
        best = x.max(1)[0]  # best_x
        aat = (x > 1. / thr).float().sum(1).mean()  # anchors above threshold
        bpr = (best > 1. / thr).float().mean()  # best possible recall
        return bpr, aat

the dwh shape origin is (3, 6, 3) instead of (18,3), so after the dwh[:, None], the shape become the (3, 1, 6, 3) , and the k[None] shape is (1, 22255, 3). so the error happened. So weather there a bug in this code,

    if bpr < 0.98:  # threshold to recompute
        print('. Attempting to improve anchors, please wait...')
        # anchors = anchors.view(-1, 3) # xjl add for shape 
        na = m.anchors.numel() // 3  # number of anchors
        try:
            anchors = nifti_kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False)
        except Exception as e:
            print(f'{prefix}ERROR: {e}')
        new_bpr = metric(anchors)[0]
        if new_bpr > bpr:  # replace anchors
            anchors = torch.tensor(anchors, device=m.anchors.device).type_as(m.anchors)
            m.anchors[:] = anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1)  # loss
            # check_anchor_order(m)  # behavior is hard to control and makes setting custom anchors unintuitive
            print(f'{prefix}New anchors saved to model. Update model *.yaml to use these anchors in the future.')
        else:
            print(f'{prefix}Original anchors better than new anchors. Proceeding with original anchors.')
    print('')  # newline
JDSobek commented 6 months ago

Can you give me more detail what you are doing to manually recompute the anchors? Can you share your model.yaml? This looks like you separated each anchor box into its own tuple, but they should be like in the provided model yamls:

anchors: # depth, width, height of the anchor boxes
- [332,152,155, 250,177,177, 283,173,161, 277,160,181, 306,158,172, 268,168,203]  # P3/8
- [332,171,161, 303,178,172, 277,195,171, 337,156,179, 285,181,188, 256,201,194]  # P4/16
- [309,176,199, 335,188,170, 336,175,187, 296,198,205, 331,207,192, 302,232,234]  # P5/32

Where the anchors at each detection "level" aren't separated by anything. The code just parses the anchor lists by 3's to determine which value belongs to which anchor. It's not as transparent as I'd like, but that's how it is (or was) done in YOLOv5 so it's done here to maintain parity.

xiongjiuli commented 6 months ago

i add the k = torch.tensor(k).cpu()into the metric function to solve the problem.

    def metric(k):  # compute metrics for anchors
        k = torch.tensor(k).cpu()
        r = dwh[:, None] / k[None]
        x = torch.min(r, 1./r).min(2)[0] # ratio metric
        best = x.max(1)[0]  # best_x
        aat = (x > 1. / thr).float().sum(1).mean()  # anchors above threshold
        bpr = (best > 1. / thr).float().mean()  # best possible recall
        return bpr, aat

then will

anchors/target = 0.00, Best Possible Recall (BPR) = 0.0004. Attempting to improve anchors, please wait...
niftianchors: WARNING: Extremely small objects found. 2 of 22255 labels are < 4.0 voxels in size.
niftianchors: Running kmeans for 18 anchors on 22253 points...
niftianchors: thr=0.25: 0.9999 best possible recall, 15.72 anchors past thr
niftianchors: n=18, img_size=350, metric_all=0.490/0.829-mean/best, past_thr=0.535-mean: 4,4,4,  5,5,5,  6,7,5,  7,6,6,  7,7,8,  7,10,7,  10,8,7,  9,7,10,  10,11,9,  14,9,9,  10,11,13,  10,16,10,  14,14,12,  14,15,19,  22,14,15,  14,25,15,  24,30,25,  49,49,44
niftianchors: Evolving anchors with Genetic Algorithm: fitness = 0.8309: 100%|████████████████████| 1000/1000 [01:24<00:00, 11.79it/s]
niftianchors: thr=0.25: 0.9999 best possible recall, 15.84 anchors past thr
niftianchors: n=18, img_size=350, metric_all=0.499/0.831-mean/best, past_thr=0.543-mean: 4,4,4,  6,5,5,  6,7,6,  7,6,6,  7,7,8,  9,8,7,  8,10,7,  9,8,10,  9,11,9,  13,9,9,  11,10,13,  10,15,10,  15,14,12,  14,15,19,  13,23,14,  22,15,15,  23,30,24,  49,53,44
autoanchor: New anchors saved to model. Update model *.yaml to use these anchors in the future.

Are these be normal output? my yaml files are used with their own.

JDSobek commented 6 months ago

The statements look normal, although the values may be off if something else is wrong.

I'm having trouble replicating the error. Even when I force the program to recompute the anchors, everything runs fine. I still need to know what initial anchors you are using, although it may be a scipy version issue if they have changed the output of kmeans in a new version. What version of scipy are you using?

xiongjiuli commented 6 months ago
Name: scipy
Version: 1.12.0

Thank you very much for your answer. I took this experiment as a comparative experiment. Although I am a small object, I cut this image into small patches in advance to ensure that it will not be resize to a smaller size, and each side length is less than or equal to the default img size of 350. I wonder if the recompute calculation of my anchor is wrong, because it is a small object, would the prior of anchor be smaller

about the recompute, to solve the dimension problem , i add anchors = anchors.view(-1, 3) # xjl add in the # Recompute the anchors if the metric is too low part.

```
 # Recompute the anchors if the metric is too low
if bpr < 0.98:  # threshold to recompute
    print('. Attempting to improve anchors, please wait...')
    anchors = anchors.view(-1, 3) # xjl add 
    na = m.anchors.numel() // 3  # number of anchors
    try:
        anchors = nifti_kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False)
    except Exception as e:
        print(f'{prefix}ERROR: {e}')
    new_bpr = metric(anchors)[0]
    if new_bpr > bpr:  # replace anchors
        anchors = torch.tensor(anchors, device=m.anchors.device).type_as(m.anchors)
        m.anchors[:] = anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1)  # loss
        # check_anchor_order(m)  # behavior is hard to control and makes setting custom anchors unintuitive
        print(f'{prefix}New anchors saved to model. Update model *.yaml to use these anchors in the future.')
    else:
        print(f'{prefix}Original anchors better than new anchors. Proceeding with original anchors.')
print('')       # newline
JDSobek commented 6 months ago

Hmm, I'm on SciPy 1.11.4, but the documentation doesn't look like there have been any changes in the kmeans function so I don't think that explains it.

anchors = anchors.view(-1, 3) # xjl add was a change I thought might fix the problem if I had been able to reproduce it, but what I don't understand is why you need that change and, even when I force the recomputation, my datasets don't.

anchors = nifti_kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False) this function is where the problems are coming from, the only thing that it takes from your model's anchors is na, but it uses numel to do so, so if the anchors are formatted incorrectly (like if they are nested an extra time or not enough) that number will be wrong and that will create a problem. The other thing it uses is your datasets labels to calculate new anchors, but from the other question your label format looks correct.

What model yaml file are you using? Can you print na and anchors before and after nifti_kmean_anchors? I think the problem is in one of those two values.