gau-nernst / vision-toolbox

Toolbox for vision tasks. Pre-trained vision backbones on ImageNet with PyTorch Lightning 🚀
18 stars 2 forks source link

Darknet derivative version support #2

Closed zhiqwang closed 2 years ago

zhiqwang commented 2 years ago

🚀 Features and Motivation

Hi @gau-nernst , This toolbox looks very impressive!

There are several derivative versions of darknet in the practical application of object detection, such as the popular yolov5 and yolox, which modify darknet a bit. Do you have any plans to support these versions?

Additional context

gau-nernst commented 2 years ago

Hello @zhiqwang,

Glad that you like this repo. YOLOv5 backbone is definitely in my todo list, but not my focus right now. Currently I'm refactoring and improving my CenterNet implementation, where this repo was original from. So far I have been getting good experience with VoVNet-39, so implementing new backbones for this repo is not a priority for now.

Your YOLOv5 backbone implementation looks very nice and clean. Previously I had difficulties understanding YOLOv5 code so I didn't implement it together with Darknet, CSPDarknet, and VoVNet. I will definitely reference your implementation when I add YOLOv5 backbone to my repo.

gau-nernst commented 2 years ago

Happy Chinese New Year!

Just an update, I have implemented the Darknet versions from YOLOv5. You can see it in the source code here.

I haven't checked extensively if there are any bugs or errors in the implementation. For a quick sanity check, I count the number of parameters of the official YOLOv5 models and my implementation, and both produce the same numbers.

A quick note that I still use ReLU by default. I will train them on ImageNet for some small models this weekend or next week.

zhiqwang commented 2 years ago

Hi @gau-nernst ,

I checked that you added the trained model to the repository, it's awesome!

As such I'm closing this ticket. Thanks again!

gau-nernst commented 2 years ago

No problem! Darknet YOLOv5x should finish training by tonight.

The classification performance doesn't look so good compared to CSPDarknet-53, probably because of my deviations from the original YOLOv5 Darknet (ReLU vs SiLU, Batch norm parameters). YOLOv5 backbone is also tuned for object detection, so its classification performance may not be optimal. Or maybe there are just noises in the training process.

Still, I hope the pretrained weights can be useful for quick prototyping in other vision tasks!