-
Commonly, the original KD loss normalizes the student and teacher logit to class probability before calculating the KL divergence, such as
`ori_kd = F.kl_div(F.log_softmax(logit_s), F.softmax(logit_…
-
-
Hi,
Thanks so much for releasing your models and data. However, after running the following command,
I could only get 9.75 for BLEU 4 on wmt14 ende.
python generate_cmlm.py data-bin/wmt14.en-d…
-
Not really an issue, I just want to share my training code since some people still have some difficulties to write the training code. Just modify the code to suit your usage.
Feel free to ask or poi…
-
## Keyword: efficient
### End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
- **Authors:** Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney,…
-
### News: 아 지난 주 너무 힘든.....
- [지난 주 못했던 내용 (죄송합니다)](https://github.com/jungwoo-ha/WeeklyArxivTalk/issues/75)
- Conferences
- ICML 2023 리뷰, ACL 2023 리뷰 나왔네요 --> 모두들 파이팅!
- ICCV 2023 Supplementa…
-
It says that pretraining is 25,000 steps and finetuning is 6000 steps for warm up only. Can I know the number of learning epochs for pretraining and finetuning including warm up?
I have taken part of…
-
_Written & Organized by 悦子yuezi
Issue Date: 2024/09/06_
#### 1/16 Post-Impressionism
**Impressionism**
developed in France in the 19C. century and is based on the practice of pain…
-
- [ ] [I'm the author of the GPT-2 work. This is a nice post, thanks for making it more... | Hacker News](https://news.ycombinator.com/item?id=39436215)
# TITLE
I'm the author of the GPT-2 work. Thi…
-
# ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models
2023 Workshop on Computational Approaches to Subjectivity, Sentiment
“oxymoron” Despite being fun to interact …