JDSobek / MedYOLO

A 3D bounding box detection model for medical data.
GNU Affero General Public License v3.0
45 stars 10 forks source link

During the training process of my own dataset, the P, R, and mAP values are all 0. Could you please help me solve this problem? #15

Closed smx1006 closed 3 months ago

small0universe commented 5 months ago

我也是这样,有无大佬知道怎么解决

JDSobek commented 5 months ago

Can you tell me more about your datasets? What are you trying to detect? Is the training loss decreasing as training progresses or is the loss staying largely the same throughout the training process? The reported metrics are for the validation set so the problem can be a failure to optimize or a failure to generalize.

During development I've seen the model work best on large, common objects. I've noticed it often struggles to begin optimization when the objects are very small and doesn't perform particularly well on rare classes. The amount of available data is also a big factor in performance, I've needed a couple hundred examples to get good results in most cases, and seen big improvements even going from ~400 training examples to ~600.

From what I've seen, performance is typically very binary, if the smallest version of the model trained with the default settings doesn't give you some sign of training progress and generalization to the validation set, you most likely either need more data or the objects of interest are too small for the model. I did see a weird thing where the small model trains well on BRaTS but the large model doesn't, and I don't have a satisfying explanation for why.

I'm not sure where the size-cutoff is for when an object is too small for the model, so you might want to try increasing the reshaped image size. Separately running the validation script with your best model using a low confidence threshold might also give you an idea of where your trained model is looking with its predictions, although IIRC that wasn't super helpful when I did it.

small0universe commented 5 months ago

My data set is the CT data of chest ribs to detect the types of fractures, a total of 500 cases, and the training set has 390 cases. The training loss will gradually decrease with training, but Labels, P, R, mAP, etc. are all 0. My pytorch version is 2.3.0, and my python version is 3.11.9. cuda version is 12.0, which uses nvidiaRTX-A6000, gpu_mem is about 25G during training, and yolo3Ds is used. My label processing is according to the template you provided, each nii corresponds to a txt text label, and each line in it is the location information of a label. But with more than 50 epochs trained, the index is still 0

JDSobek commented 5 months ago

Hmm I haven't done anything with ribs or fracture classification so I don't have a feel for how well the model works with that. Since the loss is decreasing it at least seems to be learning something, and your dataset is probably big enough for some of what it learns to eventually generalize to the validation set.

50 epochs is very few for the model though, with the 1000 epoch default training length the warmup process of the One Cycle training schedule will barely have started (bear in mind you're training from scratch, pretrained weights aren't something I can provide atm). If you haven't, let it train the full time and see if the problem persists.

smx1006 commented 5 months ago

My dataset is CBCT data for intraoral identification of the maxillary sinus and the surrounding tissue, but the data volume was small. Is there any impact of CBCT on the network, or is it the result of simply insufficient data volume? I am still expanding the data set.

smx1006 commented 5 months ago

And my box loss has been around 0.2 since training, obj loss has dropped to 0.08 since training, and cls loss has been 0.

JDSobek commented 5 months ago

My dataset is CBCT data for intraoral identification of the maxillary sinus and the surrounding tissue, but the data volume was small. Is there any impact of CBCT on the network, or is it the result of simply insufficient data volume? I am still expanding the data set.

I'm pretty sure none of the testing I've done or heard of other people doing has been CBCT, but as long as the data is a 3-D array I don't think it would introduce any different problems.

How many examples do you have? What is the rough size of the image arrays (e.g. a lot of my arrays are [512, 512, 40]), and what is an approximate size for the target bounding boxes (e.g. many of the bounding boxes in one of my datasets are close to [250, 250, 25] in size)?

And my box loss has been around 0.2 since training, obj loss has dropped to 0.08 since training, and cls loss has been 0.

The framework doesn't calculate cls loss if there's only one class, so that should be 0. IIRC 0.2 is a pretty bad box loss, but it can take some time before it starts to decrease. How many epochs have you let the model train for? I don't remember how my objectness loss typically looks, but it's just trying to compare the model's confidence to the IOU of its predictions with the target, and may just be decreasing because the model is correctly reducing its confidence on bad predictions despite not making good predictions.

Lasa937 commented 5 months ago

Hello I'm facing the same issues, I have tried to train MedYOLO with for 100 epochs to see if it was promising but actually I have 0 values all the time. I have tried to both with healthy control in the training/val set and without but it didn't change anything. Number of my dataset, (MRI images) are reported down here: Healthy control 807 FCD 504 HS and other 320 LEAT 143 Other 72 Cavernoma 56 Hypothalamic hamartoma 32 Polymicrogyria 28 Periventricular nodular heterotopia 22

And those are the Anchor boxes found: niftianchors: n=18, img_size=350, metric_all=0.382/0.773-mean/best, past_thr=0.507-mean: 4,5,4, 23,22,24, 29,40,25, 30,32,46, 21,36,84, 36,62,35, 27,38,82, 33,53,59, 43,51,54, 48,56,46, 59,36,63, 52,85,49, 57,80,62, 69,115,73, 114,78,70, 67,90,107, 105,126,134, 234,172,263

Should I consider to use a bigger model or are my object too small to detect?

JDSobek commented 5 months ago

I don't think I've ever had quick training sessions optimize. I'm fairly sure I've even seen a few training runs where the mAP metrics reported were 0 for a couple hundred epochs, and when I came back to check the results the next day it was mostly trained. IIRC BRATS typically did that... it might extend to other MR data too. Keep an eye on the training loss values reported, if they aren't changing that's bad, if they are dropping it should be learning.

For a short training run the model probably hasn't had enough time for the weights to start working together and then improving their results, but mAP is only reported for validation so there's also the potential that your training and validation sets are too dissimilar in some way. If the training loss looks like it's been decreasing over time, and, after you finish training, the validation metrics are still bad, try running val.py on your training set (make a second dataset.yaml that lists the training set as the validation set for this). That should tell you if it's a model problem or a dataset generalization problem.

I think those anchor boxes could be fine, though whatever inspired kmeans to make the (4, 5, 4) anchor box is probably too small. You might run into a problem with your rare classes being too infrequent for detection, MedYOLO doesn't do anything fancy with the sampling, although I suppose you can weight examples I don't think I've ever tried it. If you can combine some of the less frequent classes (LEAT might have enough examples, the less frequent classes almost certainly don't) with more frequent classes you'll probably get better results.

IME if the small version of the model doesn't detect anything, the larger versions won't either. Sometimes they fail where it succeeds, though. I would always start with giving the small version a 1000 epoch training run and see where that leaves you, especially when you don't have a lot of available training time the large models will just eat more time.

shchojj commented 4 months ago

Hello, esteemed author. Firstly, I would like to express my gratitude for making this project available as open source.

I have recently encountered a similar issue during my training process. Initially, I tested it on the NNDetection platform and found that it handled single targets without any coverage relationships quite effectively. However, its performance was not up to the mark when dealing with multiple targets that have presence and inclusion relationships. Attempts to fine-tune the regression function led to NaN errors, which was quite frustrating.

Subsequently, I switched to the MedYOLO project, but I observed that the loss function failed to converge during training. Post training, the predicted values were alarmingly low, at around 0.002, which seemed unusually small. I am currently at a loss to identify the source of this issue.

I would greatly appreciate any guidance or suggestions you might be able to offer to help me troubleshoot this problem. image

image

data/example.yaml `

number of classes

nc: 17

class names

names: ["background",...] `

shchojj commented 4 months ago

In the initial phase of my training, despite running the process 200 times, the convergence behavior remained consistent, particularly in relation to the bounding box predictions. The metrics for precision, recall, and mean Average Precision (mAP) at both the 0.5 and 0.95 Intersection over Union (IoU) thresholds are all registering zeros, which is quite disconcerting. Additionally, the confidence scores of the model's predictions are consistently low at 0.002, which is surprisingly minimal.

JDSobek commented 4 months ago

First, because MedYOLO has to train from scratch, it needs many epochs to optimize. 1000 epochs worked reliably for me, and from the reports I've been seeing, the model doesn't have enough time to optimize when people try to train with only 100-200 epochs. I think the fastest that early stopping stopped my models was around epoch 600-700, and I think there was room for the model to do better on that dataset with some more augmentation. The one-cycle learning rate schedule means only about 1/3 of your epochs are spent at a high learning rate where the model can optimize quickly, the beginning is spent annealing weights at low learning rates and the end is spent fine-tuning the model with lower learning rates, so it's really important the model is given enough epochs to train and fine-tune itself.

The YOLO metrics used by MedYOLO are quite strict with their cut-offs, so they may report 0 for many epochs. For comparison, nnDetection uses the COCO metric recalculated using IOUs between 0.1 and 0.5, while the standard version of the COCO metric used by YOLO and MedYOLO starts at IOU 0.5, so model needs to perform much better in order to start reporting non-zero mAPs. I've had zero validation metrics reported a few hundred epochs into the training process and still seen good results once it had actually finished.

Based on the example image you shared, I think your annotations should be in a size range where MedYOLO will work, and the training excerpt you showed shows the loss function decreasing which is a good sign... although it's pretty early in the training process so it's a bit hard to say for sure it will continue. Mostly I would say try to give the model more time to train, preferably at least 1000 epochs, though if your dataset is very small you may need more. I suppose very large datasets with several thousand examples might get away with fewer but I haven't had a dataset that large yet.

shchojj commented 4 months ago

Thank you very much for your reply, I will increase the number of iterations to improve the training effect, and I will verify the data enhancement strategy simultaneously. Thank you again for your reply and best wishes

shchojj commented 4 months ago

@JDSobek extend my gratitude to you for your dedication. By increasing the number of iterations to 1000, the model began to exhibit non-zero expectations after nearly 500 iterations, and the final outcome closely matched my expectations. In light of specific cases, it may be necessary to gather additional samples or to enhance data augmentation techniques to further refine the model's accuracy and generalization capabilities。 Snipaste_2024-07-21_15-28-26 Snipaste_2024-07-21_15-28-44

JDSobek commented 4 months ago

Yeah, 20 cases seems very low. The smallest dataset I have tested has about 60 in the training set, but it had some high-quality segmentations so I created 5 rotations for the training examples and then generated bounding boxes from the original and rotated masks, so that dataset had 360 training examples in the end.

It would be nice to be able to do live rotation, but YOLOv5 (at least back when I was writing the code) relies on openCV2 to transform the labels and that is quite picky with the data it accepts. If you figure out how to generate rotated labels it should be fairly straightforward to add live rotation to your training loop. But if you have high-quality segmentations it's much easier to generated rotated versions and bounding boxes for the rotated masks.

shchojj commented 4 months ago

Thank you for your insightful reply. I guess the data augmentation and preprocessing of nnU-Net or nnDetection could provide a better solution. I am currently experimenting with the 2.5D strategy described in your articles, because it seems to be more suitable for clinical application.