datvuthanh / HybridNets

HybridNets: End-to-End Perception Network
MIT License
582 stars 118 forks source link

add traffic sign, traffic light and pedestrian to detection #16

Closed fjremnav closed 2 years ago

fjremnav commented 2 years ago

How to modify bdd100k.yaml so object detection includes traffic sign, traffic light and pedestrian to detection?

Thanks,

datvuthanh commented 2 years ago

Hi @fjremnav,

We'll update dataset for multi classes soon.

xoiga123 commented 2 years ago

In order to use multi-classes, you need to edit hybridnets/dataset.py:

# Splitting traffic lights into 4 colors
self.id_dict = {'traffic sign': 0, 'tl_green': 1, 'tl_red': 2,
                'tl_yellow': 3, 'tl_none': 4, 'person': 5}  

# Or if you just want to detect traffic lights with no color distinction, change to
self.id_dict = {'traffic sign': 0, 'traffic light': 1, 'person': 2}  
# and comment these lines out:
# if category == "traffic light":
#     color = obj['attributes']['trafficLightColor']
#     category = "tl_" + color

and projects/project.yml:

# With color
obj_list: ['traffic sign', 'tl_green', 'tl_red', 'tl_yellow', 'tl_none', 'person']

# No color
obj_list: ['traffic sign', 'traffic light', 'person']
fjremnav commented 2 years ago

There are 2 variables, self.id_dict and self.id_dict_single in the dataset.py and which one I need to change: self.id_dict = {'person': 0, 'rider': 1, 'car': 2, 'bus': 3, 'truck': 4, 'bike': 5, 'motor': 6, 'tl_green': 7, 'tl_red': 8, 'tl_yellow': 9, 'tl_none': 10, 'traffic sign': 11, 'train': 12} self.id_dict_single = {'car': 0, 'bus': 1, 'truck': 2, 'train': 3}

Thanks,

xoiga123 commented 2 years ago

Thank you for your questions. We've realized that it's too troublesome to do manually and so we changed them a little bit.

Now to do multi-class, you only have to edit project.yml from:

obj_list: ['car']
obj_combine: ['car', 'bus', 'truck', 'train']  # if single class, combine these classes into 1 single class in obj_list
                                               # leave as empty list ([]) to not combine classes

to:

obj_list: ['person', 'traffic sign', 'traffic light']
obj_combine: []

Happy training :smile:

fjremnav commented 2 years ago

I change project.yaml according to your suggestion:

obj_list: ['person', 'traffic sign', 'traffic light'] obj_combine: []

After finishing the training and run inference using: python hybridnets_test.py -w checkpoints/None/hybridnets-d3_0_17500.pth --source demo/image --output demo_result --imshow False --imwrite True

I got the following errors:

self.__class__.__name__, "\n\t".join(error_msgs)))

RuntimeError: Error(s) in loading state_dict for HybridNetsBackbone: size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([27, 160, 1, 1]) from checkpoint, the shape in current model is torch.Size([9, 160, 1, 1]). size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([9]).

What else do I need to change to run tests?

xoiga123 commented 2 years ago

We forgor 💀

Please pull the latest code and run inference again, should be working fine now.

fjremnav commented 2 years ago

I pull the latest code and run it again. Now I have a different error below:

Traceback (most recent call last): File "hybridnets_test.py", line 105, in model.load_state_dict(torch.load(weight, map_location='cuda' if use_cuda else 'cpu')) File "/home/test/anaconda3/envs/remnav/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1407, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for HybridNetsBackbone: size mismatch for regressor.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([36, 160, 1, 1]) from checkpoint, the shape in current model is torch.Size([3840, 160, 1, 1]). size mismatch for regressor.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([36]) from checkpoint, the shape in current model is torch.Size([3840]). size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([27, 160, 1, 1]) from checkpoint, the shape in current model is torch.Size([2880, 160, 1, 1]). size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([2880]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "hybridnets_test.py", line 107, in model.load_state_dict(torch.load(weight, map_location='cuda' if use_cuda else 'cpu')['model']) KeyError: 'model'

xoiga123 commented 2 years ago
anchor_ratios = params.anchors_scales
anchor_scales = params.anchors_ratios

We did a little oopsie haha

sorry

fjremnav commented 2 years ago

I pull the latest code and test seems working OK. However, quality is quite bad on test images with my trained model using bdd100k dataset. See the attachment of inferencing result of 4.jpg 4 Do you have a trained weight with traffic light, traffic sign and car in the detection classes?

datvuthanh commented 2 years ago

Hi @fjremnav, before training your network. You need to recalculate the anchor scales and anchor ratios and modify your anchor scale in backbone.py. Secondly, training object detection task is a challenge, it took at least 250-300 epochs to model can convergence.

fjremnav commented 2 years ago

@datvuthanh

With the latest code, I am not able to do training after 1st. Below is the log: Step: 17499. Epoch: 0/500. Iteration: 17500/17500. Cls loss: 0.76407. Reg loss: 7.51139. Seg loss: 0.48101. Total loss: 8.75646: 100%|██████████| 17500/17500 [1:32:22<00:00, 3.16it/s] checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... checkpoint... 100%|██████████| 2500/2500 [14:49<00:00, 2.81it/s]Killed

Have you successfully trained with traffic lights/signs and pedetstrian with decent results?

xoiga123 commented 2 years ago

Hi @fjremnav, we did train them for a few epochs and it worked fine. Let's look at your log:

  1. Training completed a whole epoch. Losses are okay.
  2. Really really weird that checkpoint got printed an absurd amount of times. We'll look into this later, but this is not our main concern here. It probably worked fine enough (albeit checkpointing the same weight many times).
  3. Validating got killed at the end.

=> This is because we set nms_threshold at 0.001 to get the best recall possible in our paper. So when calculating metrics, it has to handle a large amount of bounding boxes, leading to out-of-memory, and finally exploding the program before the next epoch.

That being said, there are multiple ways to circumvent this problem, choose the best that suit you:

yaoshanliang commented 1 year ago

Got that.