-
I don't quite understand how knowledge distillation is implemented here.
Whisper is trained on 680,000 hours of untagged data for autoregression. According to the content of the fourth section of …
-
Hi,
Did you reimplement for object detection? I have tried ReviewKD for my own dataset and my own model, but found out it's not good.
twmht updated
2 years ago
-
Hello! Thank you for your excellent work!
I encountered an issue while trying to run the DiscoNet model on the DAIR-V2X dataset using the command `python opencood/tools/train.py --hypes_yaml openco…
-
### Model description
MaskCLIP represents a transformative step in the realm of open-vocabulary universal image segmentation. Built upon the robust foundation of pre-trained CLIP models, it negates…
-
Hi all! I just wanna confirm one confusion.
1. Is it true that we could not get the training accuracy because we have a teacher to monitor that?
2. Is it possible to fine-tune the DeiT with distil…
-
### Description
The title and author information appears twice, both above and below the abstract. In addition, the author information that appears above the abstract is not formatted correctly.
###…
-
Hi @nreimers
I would like to use Bert-base and Bert-large versions cross-encoder trained on ms-marco. I tried to fine-tune `"cross-encoder/ms-marco-MiniLM-L-12-v2"` on NQ and other standard datase…
-
Excuse me, I want to retrain the model in the Flyingthings3D dataset and have set the parameters according to the paper. But I find a strange thing: step=85000 and 475000, the optical flow epe error i…
-
### Metadata: Knowledge Distillation Meets Self-Supervision
- Author: Guodong Xu, Ziwei Liu, Xiaoxiao Li, Chen Change Loy
- Organization: The Chinese University of Hong Kong & Nanyang Technological …
-
你好,我在复现您的实验(没有进行任何修改)的时候在主干网络的训练时准确率是逐渐提高的,在蒸馏阶段验证集和测试集的acc每一个epoch都和主干网络的最后一个epoch相同,请问是我哪里出错了吗?