Closed AmanGoyal99 closed 2 years ago
Thank you for the interest in our work!
We can perform mean squared error-based knowledge distillation with the bdd100k dataset. Please check out the instructions at https://github.com/NVlabs/DIODE/tree/yolo/knowledge_distillation/yolov3-master that describe how to distill a Yolo-V3 teacher model that was trained on the COCO dataset to a student Yolo-v3 network, using images from the BDD100k dataset as the proxy distillation dataset. Note that we use bdd100k as the proxy dataset for distillation because the data-free assumption in our paper means that once the teacher was trained, we discard the original dataset (hence, data-free) and are only left with the teacher model weights.
However, If you wish to train the teacher on BDD100k and then distill to a student with the BDD100k dataset, it would be better to use a more up-to-date repository for yolo-v3: https://github.com/ultralytics/yolov3 and adapt our distillation code: https://github.com/NVlabs/DIODE/blob/yolo/knowledge_distillation/yolov3-master/utils/distill_utils.py for your purposes. Also note that you may need to convert the bdd100k dataset into a format consistent with the one used by https://github.com/ultralytics/yolov3 .
Let me know if I can help you in any other way. Thanks!
Greetings,
So I want to perform distillation between 2 backbones. Currently I already have Resnet-50 trained on BDD100K ready. I want to take Resnet-50 model as teacher and distill it into my own architecture as student. Could you please guide me on how I can achieve this. Thanks
Hi @akshaychawla. Can you share any resources for the SelfSimilarityHook which you have in the deepinversion code? In the original code, it was not present, and I am not able to find relevant papers for it.
@AmanGoyal99 I'd like to know a little more about the problem you are trying to solve before recommending a solution and pointing you towards a snippet in our repository that might be helpful.
Typically, To make a hard decision, a network must suppress information in its output space that might reveal contextual details on the input. (e.g., through an argmax layer). To distill this hidden/suppressed information into a student network, we must enhance it and then make the student network imitate the enhanced outputs using an appropriate loss function.
To enhance the output space, we must understand the teacher's task, output space, and loss function. Can you tell me:
Please note that this repository does not support easily loading arbitrary backbones for detection training. And it does not support training models with BDD100k dataset. We only support the distillation of a pre-trained COCO Yolo-v3 model into another Yolo-v3 model while using proxy datasets such as bdd100k with its' bbox labels discarded.
@animesh-007 Please refer to issue https://github.com/NVlabs/DIODE/issues/7 to discuss the self-similarity hook. Each thread is restricted to one issue as much as possible.
Greetings @akshaychawla ,
1) Task of Resnet-50 is to be used as backbone for Faster-RCNN 2) So there is paper called 'Quasi Dense Tracking for Multiple Object Tracking'. It is basically a tracking method which uses Faster RCNN and RPN with backbone as Resnet-50. 3) This repo of QDTrack was used to train : https://github.com/SysCV/qdtrack 4) So my objective is to basically replace Resnet-50 (trained on BDD100K) with a lighter backbone. I am just trying to get the lighter backbone using KD.
Please do let me know if you have any queries and would want any other info about it.
Thanks
Thanks for the quick response @AmanGoyal99 . After looking at the problem, it seems that our repository will not be appropriate for knowledge distillation for Faster-RCNN based neural networks. Our repository only supports Yolo-v3 single-stage object detection models with a DarkNet backbone.
In order to distill a Faster-RCNN model, you will need to distill 3 items: the backbone, the RPN head and the ROI detection head. I suggest you look at the following papers which distill Faster-RCNN teacher and student models:
I didn't search for the code of these papers but it should be fairly easy to find and/or implement. In our code base, there is only one file https://github.com/NVlabs/DIODE/blob/yolo/knowledge_distillation/yolov3-master/utils/distill_utils.py that implements the hint learning approach described by [1] in Figure 1 which you can adapt when implementing [1].
So I just want to distill the backbone actually
Apologies for the delay @AmanGoyal99 . If you just want to distill the backbone, you can use the Distillation.mse
loss in this module https://github.com/NVlabs/DIODE/blob/yolo/knowledge_distillation/yolov3-master/utils/distill_utils.py in your distillation code. While this module was designed to distill the single stage detector outputs, it should still work well for just distilling the backbone. The rest of our repository is not relevant for your particular problem.
Greetings,
This is Aman Goyal. I am currently pursuing research in MSU in the domain of knowledge distillation and I had come across your paper and github repo. I actually wanted to train on BDD100K detection dataset. Is it possible to integrate with your codebase ? If yes, then please guide on how to do it. I already have BDD100K dataset ready.
Regards, Aman Goyal