Closed chunguangqu closed 9 months ago
You can modify the function https://github.com/MendelXu/SAN/blob/10bf7889780ea7820be3bdd377717f28f5f13360/san/data/datasets/register_coco_stuff_164k.py#L181 like this. https://github.com/MendelXu/SAN/blob/10bf7889780ea7820be3bdd377717f28f5f13360/san/data/datasets/register_voc.py#L30
Only a list of category names are required.
I will modify the function as follows: CLASS_NAMES = ( "oil cup", "liquid oil", "magnetic flap", "liquid water", )
def _get_coco_stuff_meta(cat_list): ret = { "stuff_classes": cat_list, } return ret
But still reporting such an error: File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/home/quchunguang/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker main_func(args) File "/home/quchunguang/003-large-model/SAN/train_net.py", line 274, in main return trainer.train() File "/home/quchunguang/detectron2/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/home/quchunguang/detectron2/detectron2/engine/train_loop.py", line 155, in train self.run_step() File "/home/quchunguang/detectron2/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/home/quchunguang/detectron2/detectron2/engine/train_loop.py", line 492, in run_step loss_dict = self.model(data) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/san.py", line 206, in forward losses = self.criterion(outputs, targets) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/criterion.py", line 234, in forward indices = self.matcher(outputs_without_aux, targets) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/matcher.py", line 184, in forward return self.memory_efficient_forward(outputs, targets) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "/home/quchunguang/003-large-model/SAN/san/model/matcher.py", line 127, in memory_efficient_forward tgt_mask = point_sample( File "/home/quchunguang/detectron2/projects/PointRend/point_rend/point_features.py", line 39, in point_sample output = F.grid_sample(input, 2.0 point_coords - 1.0, kwargs) File "/home/quchunguang/anaconda3/envs/mmdet-sam/lib/python3.8/site-packages/torch/nn/functional.py", line 4223, in grid_sample return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Did you have a deeper look into the error? The problem is triggered by grid_sample
, you may carefully check the input.
My dataset consists of 4 classes and does not have a stuff class. I only need to modify the number of categories in the "SAN/san/config. py" file and the "SAN/san/data/datasets/register_coco_suff164k. py" get coco stuff_meta() function? Do I need to modify the configuration parameters for detecton2 or other files?
I think there should not be other parameters. Is it possible to share me the modified code and the whole training log?
[Uploading san-1109.zip…]()
Attached are the code and dataset I used. Could you please help me identify the issue?
The link seems invalid. It points to current issue.
I sent you the download link for Baidu Netdisk: 链接:https://pan.baidu.com/s/182QmirMpXRqEhIIkAXIM6A?pwd=jow7 提取码:jow7
Sorry for late reply. I think the issue is possibly that you are still using the coco stuff datasets.
In line 212 of san/data/datasets/register_coco_stuff_164k.py
, the root path is till pointed to the coco dataset.
I didn't understand what you meant. As expressed in your code, the training set uses the co stuff format, while the validation set uses co stuff, Pascal VOC-20, Pascal Context-59, and so on. So my training set and validation set are both in Coco Stuff format(the path is :SAN/datasets/coco/stuffthingmaps_detectron2/). It's just that my object category is 4, and stuff category 91 is also 4. So I would like to ask where I need to modify it to ensure normal training?
So are you sure that the data used in the training is correct? Like the category index in the segmentation map is 0, 1, 2, 3. I think the bug is very easy to debug... Just add breakpoint at the line your training raised error and check the data.
Thank you for your patient reply, the problem has been solved
hello, thanks for your Excellent work,but I have some questions that I need to consult with: The data I annotate with labelme now has four classes, and then I have converted the data from these four classes to the format of stuffingmaps. However, where do I need to make corresponding modifications when training with my own dataset? Especially for registers Coco Stuff_ In the 164k.py file, my dataset only has 4 classes and does not have the co stuff label of 91 classes.