-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and fou…
-
Hello,
Imagen-Video states that they use model distillation to iteratively train student diffusion models that require half the sampling steps of their teacher diffusion model. This seems to be an …
-
hi,
can I use "knowledge distillation" and "dimension reduction" for Bert-large?
and if it is possible, for knowledge distillation how many layers should be remained in option2 ?
and for dimension …
-
Thanks for your work! I have some questions about model distillation.
"we leverage the same training loop with a few exceptions: we use a larger
model as a frozen teacher, keep a spare EMA of the st…
-
### Describe the bug
**It seems the wandb crash when I run another program using DDP.**
I have two separate Python programs, A and B. Program A uses `torch.nn.DataParallel` to run a neural network…
-
Hi.
I have question about the method which I am confused.
In the paper, I can see that in Figure3, the generated adversarial images are fed to old model and estimates logit vector? and used in …
-
### Request for Release of Pretrained NLLB-LLM2Vec Model
Hello Team,
Could you please release the pretrained NLLB-LLM2Vec models mentioned in your paper on "Self-Distillation for Model Stacking…
-
I checked the log and the pytorch_model_distill.pt is picked in processing. But the latency is same as the ema ckpt: 51s on A100. Is this normal? Is there any argument I haven't set correctly to unloc…
-
### Describe the issue
Hi,
I'm wondering how to use ````ipex.optimize(...)```` when I have two models, for example, teacher and student in model distillation but only one optimizer. Would calls like…
-
### Model/Pipeline/Scheduler description
ConsistencyTTA, introduced in the paper [_Accelerating Diffusion-Based Text-to-Audio Generation
with Consistency Distillation_](https://arxiv.org/abs/2309.…