MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
295 stars 27 forks source link

On the issue of self created Coco format datasets #35

Closed chunguangqu closed 9 months ago

chunguangqu commented 10 months ago

hello, thanks for your Excellent work,but I have some questions that I need to consult with: The data I annotate with labelme now has four classes, and then I have converted the data from these four classes to the format of stuffingmaps. However, where do I need to make corresponding modifications when training with my own dataset? Especially for registers Coco Stuff_ In the 164k.py file, my dataset only has 4 classes and does not have the co stuff label of 91 classes.

MendelXu commented 10 months ago

You can modify the function https://github.com/MendelXu/SAN/blob/10bf7889780ea7820be3bdd377717f28f5f13360/san/data/datasets/register_coco_stuff_164k.py#L181 like this. https://github.com/MendelXu/SAN/blob/10bf7889780ea7820be3bdd377717f28f5f13360/san/data/datasets/register_voc.py#L30

Only a list of category names are required.

chunguangqu commented 10 months ago

I will modify the function as follows: CLASS_NAMES = ( "oil cup", "liquid oil", "magnetic flap", "liquid water", )

def _get_coco_stuff_meta(cat_list): ret = { "stuff_classes": cat_list, } return ret

def register_all_coco_stuff_164k(root): root = os.path.join(root, "coco") meta = _get_coco_stuff_meta(CLASS_NAMES)

But still reporting such an error: File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/home/quchunguang/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker main_func(args) File "/home/quchunguang/003-large-model/SAN/train_net.py", line 274, in main return trainer.train() File "/home/quchunguang/detectron2/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/home/quchunguang/detectron2/detectron2/engine/train_loop.py", line 155, in train self.run_step() File "/home/quchunguang/detectron2/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/home/quchunguang/detectron2/detectron2/engine/train_loop.py", line 492, in run_step loss_dict = self.model(data) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/san.py", line 206, in forward losses = self.criterion(outputs, targets) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/criterion.py", line 234, in forward indices = self.matcher(outputs_without_aux, targets) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/matcher.py", line 184, in forward return self.memory_efficient_forward(outputs, targets) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/matcher.py", line 127, in memory_efficient_forward tgt_mask = point_sample( File "/home/quchunguang/detectron2/projects/PointRend/point_rend/point_features.py", line 39, in point_sample output = F.grid_sample(input, 2.0 point_coords - 1.0, kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/functional.py", line 4223, in grid_sample return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

MendelXu commented 10 months ago

Did you have a deeper look into the error? The problem is triggered by grid_sample, you may carefully check the input.

chunguangqu commented 10 months ago

My dataset consists of 4 classes and does not have a stuff class. I only need to modify the number of categories in the "SAN/san/config. py" file and the "SAN/san/data/datasets/register_coco_suff164k. py" get coco stuff_meta() function? Do I need to modify the configuration parameters for detecton2 or other files?

MendelXu commented 10 months ago

I think there should not be other parameters. Is it possible to share me the modified code and the whole training log?

chunguangqu commented 10 months ago

[Uploading san-1109.zip…]()

Attached are the code and dataset I used. Could you please help me identify the issue?

MendelXu commented 10 months ago

The link seems invalid. It points to current issue.

chunguangqu commented 10 months ago

I sent you the download link for Baidu Netdisk: 链接:https://pan.baidu.com/s/182QmirMpXRqEhIIkAXIM6A?pwd=jow7 提取码:jow7

MendelXu commented 10 months ago

Sorry for late reply. I think the issue is possibly that you are still using the coco stuff datasets. In line 212 of san/data/datasets/register_coco_stuff_164k.py, the root path is till pointed to the coco dataset.

chunguangqu commented 10 months ago

I didn't understand what you meant. As expressed in your code, the training set uses the co stuff format, while the validation set uses co stuff, Pascal VOC-20, Pascal Context-59, and so on. So my training set and validation set are both in Coco Stuff format(the path is :SAN/datasets/coco/stuffthingmaps_detectron2/). It's just that my object category is 4, and stuff category 91 is also 4. So I would like to ask where I need to modify it to ensure normal training?

MendelXu commented 10 months ago

So are you sure that the data used in the training is correct? Like the category index in the segmentation map is 0, 1, 2, 3. I think the bug is very easy to debug... Just add breakpoint at the line your training raised error and check the data.

chunguangqu commented 10 months ago

Thank you for your patient reply, the problem has been solved