-
- https://arxiv.org/abs/2107.07651
- 2021
大規模な視覚と言語表現の学習は、様々な視覚-言語タスクにおいて有望な改善を示している。
既存の手法の多くは、変換器ベースのマルチモーダル・エンコーダを用いて、ビジュアル・トークン(領域ベースの画像特徴)と単語トークンを共同でモデル化している。
しかし、**視覚的トークンと単語トークンの位置がずれているた…
e4exp updated
3 years ago
-
## Keyword: sgd
### Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability
- **Authors:** Authors: Haoyi Xiong, Xuhong Li, Boyang Yu, Zhanxing Zhu, Dongrui Wu, Dejin…
-
Thanks for this interesting work, and I believe it would be valuable for people in this area.
Here, I have some problems. Could the authors provide some explanation?
(1) Why the inference time o…
-
### Feature request
Training code implementation for finetuning Whisper using prompts.
Hi All,
I’m trying to finetune Whisper by resuming its pre-training task and adding initial prompts as pa…
-
As part of the Llama 3.1 release, Meta is releasing an RFC for ‘Llama Stack’, a comprehensive set of interfaces / API for ML developers building on top of Llama foundation models. We are looking for f…
-
Hello, could you please elaborate on the implementation of SeqKD? Given that the vocabularies differ, the KL loss cannot be directly applied. How did you overcome this issue? If token alignment was us…
-
### Describe the bug
The LCM scheduler and LCM training scripts use the following formula for the $`c_{\mbox{out}}(t)`$ scaling (ignoring timestep scaling):
```math
c_{\mbox{out}}(t) = \frac{t}{\…
dg845 updated
3 months ago
-
-
Hello! I try to run the sh_scripts/run_sd15_lora.sh on multi-GPUs, by setting "--num_processes=4", and meet the following error:
[AW0701 03:52:22.574000 139785762685568 torch/distributed/elastic/mu…
-
Designing a Multi-Layered Hierarchy of Control
You
I'm working on a idea for a multi-layered hierarchy of control
Copilot
That sounds like an interesting project! A multi-layered hierarchy of co…