JDSobek / MedYOLO

A 3D bounding box detection model for medical data.
GNU Affero General Public License v3.0
39 stars 9 forks source link

How to make the dataset #18

Closed kennys-cs1105 closed 2 months ago

kennys-cs1105 commented 3 months ago

My dataset is LungNodule CT data, and I am trying to detect the lung nodules. And my labels are in the order of : class-number z-center x-center y-center z-length x-length y-length. For example, 1 0.812500 0.001563 0.843750 0.044305 0.044305 0.044305 And why it turns out :

images and labels...0 found, 0 missing, 0 empty, 368 corrupted: 100%
~
AssertionError: No labels in /data/MedData/mydata/labels/train.cache. Can not train without labels.

Could you please help me solve my question? Thank you!

JDSobek commented 3 months ago

In my tests MedYOLO doesn't work well with lung nodules, so you might want to try nnDetection or some other model. I don't do a lot of work with lung nodules, so that could just have been my unfamiliarity with the subject, but I'd rather warn you before you spend a lot of time preparing data for this model.

This error might be a bit hard to diagnose. I only ran into corrupt labels once and I don't remember what specifically I had done wrong, so this might take some troubleshooting. First thing I would do is delete the train.cache and val.cache files, in case they're doing something funny.

Next, I'd double check your image files and label files are named correctly (e.g. the images are like img01.nii.gz and the corresponding labels are like img01.txt). I don't think that will report corruption, but maybe.

Then, double check that your dataset.yaml file has the correct number of classes. MedYOLO/YOLOv5 labels are zero-indexed, so if you told the dataset.yaml that you had 1 class, it will expect that class number to be 0. I like to have my classes be one-indexed for conversion into masks, so for a single-class dataset I typically will make the dataset.yaml like example.yaml, where I will say it is a 2 class dataset, and then I just make the first class (i.e. class 0 or 'ObjectClass1' in example.yaml) something like 'NA' just to indicate to other people that there are no actual labels for class 0 and it's just a dummy class. That's my first guess for why it's reporting corrupted labels, it's kind of unintuitive.

If that's not the problem, post the contents of your dataset.yaml and maybe the layout of the folders containing your dataset images/labels.

kennys-cs1105 commented 3 months ago

In my tests MedYOLO doesn't work well with lung nodules, so you might want to try nnDetection or some other model. I don't do a lot of work with lung nodules, so that could just have been my unfamiliarity with the subject, but I'd rather warn you before you spend a lot of time preparing data for this model.

This error might be a bit hard to diagnose. I only ran into corrupt labels once and I don't remember what specifically I had done wrong, so this might take some troubleshooting. First thing I would do is delete the train.cache and val.cache files, in case they're doing something funny.

Next, I'd double check your image files and label files are named correctly (e.g. the images are like img01.nii.gz and the corresponding labels are like img01.txt). I don't think that will report corruption, but maybe.

Then, double check that your dataset.yaml file has the correct number of classes. MedYOLO/YOLOv5 labels are zero-indexed, so if you told the dataset.yaml that you had 1 class, it will expect that class number to be 0. I like to have my classes be one-indexed for conversion into masks, so for a single-class dataset I typically will make the dataset.yaml like example.yaml, where I will say it is a 2 class dataset, and then I just make the first class (i.e. class 0 or 'ObjectClass1' in example.yaml) something like 'NA' just to indicate to other people that there are no actual labels for class 0 and it's just a dummy class. That's my first guess for why it's reporting corrupted labels, it's kind of unintuitive.

If that's not the problem, post the contents of your dataset.yaml and maybe the layout of the folders containing your dataset images/labels.

OK, thank you! I will check my data and try more other models.