teacher-student-training Search Results

huggingface/trl #2217

[GKD] 0 loss

### System Info ``` pip install git+https://github.com/huggingface/transformers.git pip install tokenizers==0.20.0 pip install accelerate==0.34.2 pip install git+https://github.com/huggingface/tr…

nivibilla updated 2 weeks ago

huggingface/trl #2215

[GKD] mismatch in tensors when stacking log probs

### System Info Latest TRL from source, can't run TRL env rn as cluster is shut down but I'm installing everything from source. If required will restart cluster and run. ### Information - [ ] Th…

nivibilla updated 1 month ago

ultralytics/ultralytics #17013

Knowledge distillation with yolo11

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…

hvmatrix updated 1 week ago

ultralytics/ultralytics #15679

Student Model: Yolov8 Knowledge Distillation Implementation

Murtazaabidi1 updated 2 weeks ago

Kinyugo/consistency_models #15

Is it Consistency Distillation or Consistency Model in isola…

Hi @Kinyugo , Great repo, thank you for the work! I just wanted to clarify something. From my understanding, what you have implemented here is Consistency Distillation (CD), right? The consisten…

wasphulud updated 1 month ago

winycg/CLIP-KD #13

initializing the student network with pre-trained weights

Hi Chuangguang, Great work and thanks for sharing your code! I have a question regarding the student networks in your method. From what I’ve seen, the student networks are all trained from scrat…

RuixiangZhao updated 2 weeks ago

Luffy03/VoCo #17

About the pretrain

Hi, thank you for this great work! We used the pre-training code and data provided in the current repository to run pre-training, but the performance on downstream tasks was not as strong as the VoCo_…

andyaloha updated 4 weeks ago

browsermt/students #71

Training speed Teacher to Student

Is there any data that you can share on how long it took to train the student models with the recommended setup of [4 GPUs with 12GB memory](https://github.com/browsermt/students/blob/master/train-stu…

Godnoken updated 1 year ago

qcf-568/DocTamper #78

Are you planning to release these two pre-trained files?

self.vph = torch.load('vph_imagenet.pt') self.swin = torch.load('swin_imagenet.pt')

TwitchOnly111 updated 2 weeks ago

mozilla/translations #231

Investigate distillation quality gap

After training en-hu we noticed a somewhat larger quality gap in 4 BLEU points between the teacher and student models. It’s 24.8 for the quantized and fine-tuned student vs 30.2 BLEU for the teache…

eu9ene updated 1 month ago

1000+ results for teacher-student-training

1000+ results
for teacher-student-training