distillation-model Search Results

1000+ results
for distillation-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

e4exp/paper_manager_abstract #618

Align before Fuse: Vision and Language Representation Learni…

- https://arxiv.org/abs/2107.07651 - 2021 大規模な視覚と言語表現の学習は、様々な視覚-言語タスクにおいて有望な改善を示している。既存の手法の多くは、変換器ベースのマルチモーダル・エンコーダを用いて、ビジュアル・トークン（領域ベースの画像特徴）と単語トークンを共同でモデル化している。しかし、**視覚的トークンと単語トークンの位置がずれているた…

e4exp updated 3 years ago
2
zoq/arxiv-updates #484

New submissions for Tue, 4 Apr 23

## Keyword: sgd ### Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability - **Authors:** Authors: Haoyi Xiong, Xuhong Li, Boyang Yu, Zhanxing Zhu, Dongrui Wu, Dejin…

zoq updated 1 year ago
2
NVlabs/A-ViT #2

The inference time of A-Vit same as the Deit.

Thanks for this interesting work, and I believe it would be valuable for people in this area. Here, I have some problems. Could the authors provide some explanation? (1) Why the inference time o…

dk-liang updated 2 months ago
3
huggingface/transformers #24272

Finetuning Whisper with prompts

### Feature request Training code implementation for finetuning Whisper using prompts. Hi All, I’m trying to finetune Whisper by resuming its pre-training task and adding initial prompts as pa…

AvivSham updated 10 months ago
39
meta-llama/llama-stack #6

RFC-0001 - Llama Stack

As part of the Llama 3.1 release, Meta is releasing an RFC for ‘Llama Stack’, a comprehensive set of interfaces / API for ML developers building on top of Llama foundation models. We are looking for f…

raghotham updated 2 days ago
33
songmzhang/DSKD #2

About SeqKD with different vocabularies

Hello, could you please elaborate on the implementation of SeqKD? Given that the vocabularies differ, the KL loss cannot be directly applied. How did you overcome this issue? If token alignment was us…

2018cx updated 3 months ago
3
huggingface/diffusers #8547

Latent Consistency Model Possible Scaling Bug

### Describe the bug The LCM scheduler and LCM training scripts use the following formula for the $`c_{\mbox{out}}(t)`$ scaling (ignoring timestep scaling): ```math c_{\mbox{out}}(t) = \frac{t}{\…

dg845 updated 3 months ago
5
GENZITSU/UsefulMaterials #66

weekly useful materials - 08/24 -

GENZITSU updated 3 years ago
18
yandex-research/invertible-cd #5

Errors I met when running the training code on multi-GPUs

Hello! I try to run the sh_scripts/run_sd15_lora.sh on multi-GPUs, by setting "--num_processes=4", and meet the following error: [AW0701 03:52:22.574000 139785762685568 torch/distributed/elastic/mu…

lyx0208 updated 3 months ago
3
meta-introspector/meta-meme #141

Hierarchy

Designing a Multi-Layered Hierarchy of Control You I'm working on a idea for a multi-layered hierarchy of control Copilot That sounds like an interesting project! A multi-layered hierarchy of co…

jmikedupont2 updated 5 months ago
5

上一页 1...91 92 93 94 95 96 97...100 下一页

1000+ results for distillation-model

1000+ results
for distillation-model