Closed xiongjiuli closed 5 months ago
Without a good set of pretrained weights, it can take a long time for the model to start figuring out enough about the objects of interest to start fitting them. Since the model is not really trained, your predictions will be essentially random. The values in the predictions you shared look normal, even though they clearly aren't accurate. Programmatically it can predict objects that are only partially inside the image, though I don't have a dataset to test how well it does that.
I think the labels look correctly formatted. This model doesn't tend to optimize well for very small objects (e.g. I haven't been able to get it to optimize on LIDC), which might be your problem. A sliding-window/patch-based architecture like nnDetection, or one of the lung nodule models, might work better for you if you need to detect very small objects. In my experience, MedYOLO works better for localizing whole organs and larger regions that would span several patches/windows.
image
in (image,class,z,x,y,d,w,h)
is the batch index, so you're seeing 0 because it's the first image in the batch. It's not the imaging data, it's just tracking which predictions belong to which training example.
If I'm reading this right you have ~22,000 labeled training examples, so you should have a good chance for the model to figure things out, but if your anchors are bad, re: the other issue, there might still be issues.
Thank you very much for your answer. As for the question about small objects, you mentioned sliding-window/patch-based. I was wondering if I could first cut the image into 100, 100 and 100 patches as my input image, so that during the training process, It will resize to 350,350,350, and then it will be relatively large. On the overall image, have you tried something similar with the previous nodule dataset
If GPU memory allows you can resize your incoming data to a size larger than 350x350x350 (e.g. 512x512x512), although I've never gotten good results doing that on datasets that weren't working at 350.
I think the problem comes from the downsampling making small objects disappear before they get deep into the network, so maybe breaking the images up into much smaller patches will provide enough label volume after resizing for the objects of interest to survive deep enough into the network to make useful predictions. I haven't tried it though. I've seen the model work really well on objects that are common in the dataset (e.g. hearts, livers), and not so well on objects that are rare... though my datasets have been fairly small (<~1000).
You'll need to write some code to consolidate the predictions back into single images if you try the patch approach. There might be some other concerns I'm unaware of too. If it works that would be interesting, though whether it's worth your effort instead of using a model that will natively do this is a question.
sorry first. This is very disturbing. After I run through the program, the result has been wrong. I have never run Anchor-based detection methods including yolo series before my run :
my data format is :
and the .txt file is below, i only have one class, i start with 0, and my object is relatively small.
but my output is:
then i check the loss: in the loss part , in the bulid target function , i print the
@@@ bulid targets @@@ the target should be (image,class,z,x,y,d,w,h) - tensor([0.00000, 0.00000, 0.10811, 0.47003, 0.48593, 0.03041, 0.02571, 0.02571], device='cuda:0')
in thewhy the image value is 0
In addition, the subsequent pbox will also have very strange values. I am afraid that my label format is wrong or not, and I have not changed anything else
the output is :