NVlabs / mask-auto-labeler

Other
160 stars 13 forks source link

How can I train MAL with my own dataset? #3

Closed IamRecalcitrance closed 1 year ago

IamRecalcitrance commented 1 year ago

I have some datasets with box annotations for object detection, which I would like to use to train MAL. I hope that the trained MAL can infer some instance segmentation results, is this possible? If it is feasible, how do I set up 'datasets/pl_data_module.py', which I have always failed to train MAL on in the past. Best wishes.

-------------------------------------pl_data_module.py-------------------------------------------------- coco=dict( training_config=dict( train_img_data_dir='data/coco/train2017', val_img_data_dir='data/coco/val2017', test_img_data_dir='data/coco/test2017', dataset_type='coco', train_ann_path="data/coco/annotations/boxes_train2017.json", val_ann_path="data/coco/annotations/instances_val2017.json", ), generating_pseudo_label_config=dict( train_img_data_dir='data/coco/train2017', train_ann_path="data/coco/annotations/boxes_train2017.json", val_img_data_dir='data/coco/train2017', dataset_type='coco', val_ann_path="data/coco/annotations/boxes_train2017.json", ) ),

-------------------------------error-------------------------------------------------------- IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/asus/anaconda3/envs/mask-auto-labeler/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/asus/anaconda3/envs/mask-auto-labeler/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/asus/anaconda3/envs/mask-auto-labeler/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/asus/PycharmProjects/mask-auto-labeler-main/datasets/voc.py", line 231, in getitem mask = np.ascontiguousarray(maskUtils.decode(maskUtils.frPyObjects(ann['segmentation'], h, w))) File "pycocotools/_mask.pyx", line 293, in pycocotools._mask.frPyObjects IndexError: list index out of range

IamRecalcitrance commented 1 year ago

When I run the command 'python main.py' for training, the script always tries to load the mask in 'voc.py'. My dataset_type==coco. My dataset does not have a mask for instance segmentation, only boxes.

voidrank commented 1 year ago

Hi @IamRecalcitrance , try to add the option --not_eval_mask

IamRecalcitrance commented 1 year ago

Thank you very much for your prompt response. I used 'python main.py --not_eval_mask' to solve the previous problem. But after that, a new error appeared (below). Does this have anything to do with 'train_ann_path="data/coco/annotations/boxes_train2017.json", val_ann_path="data/coco/annotations/instances_val2017.json"'? boxes_train2017.json and instances_val2017.json mean exactly?

| Name | Type | Params

0 | mIoUMetric | MIoUMetrics | 0
1 | areaMIoUMetrics | ModuleList | 0
2 | mean_field | MeanField | 0
3 | student | MALStudentNetwork | 93.1 M 4 | teacher | MALTeacherNetwork | 93.1 M

186 M Trainable params 0 Non-trainable params 186 M Total params 372.362 Total estimated model params size (MB) Sanity Checking DataLoader 0: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.30it/s]/home/asus/anaconda3/envs/mask-auto-labeler/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: The compute method of metric MIoUMetrics was called before the update method which may lead to errors, as metric states have not yet been updated. warnings.warn(*args, **kwargs) val/mIoU: nan ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [64,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [65,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [66,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [67,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [68,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [69,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [3,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

voidrank commented 1 year ago
  1. You can use instances_train2017.json instead. boxes_train2017.json is just instances_train2017.json without mask labels
  2. You might need to build the docker from scratch
IamRecalcitrance commented 1 year ago

I get it. Thank you again for your reply. I will keep trying.

voidrank commented 1 year ago

@IamRecalcitrance Have you solved your problem?

IamRecalcitrance commented 1 year ago

@IamRecalcitrance Have you solved your problem?

Yes, I have solved this problem.

IamRecalcitrance commented 1 year ago

Since the error is thrown in the graphics card, it is difficult to find the root of the problem. In fact, the problem comes from the fact that the number of classes in the dataset does not match the setting in "pl_data_module.py", and the number of classes in "num_class_dict" must be more than the actual number for the model to train properly.

IamRecalcitrance commented 1 year ago

The key code that caused the problem: -mal.py: --validation_epoch_end(): cat_kv = dict([(cat["name"], cat["id"]) for cat in coco.categories]) things_ids = [] for thing in coco.things: things_ids.append(coco.cat_mapping[cat_kv[thing]])

tianyufang1958 commented 1 year ago

In pl_data_module.py, the num_class_dict, the number should be the actual num + 1.