-
Hi,
May I know whether I can use sima instead of multi head attention in decoder, to reduce complexity?
Thanks!
-
### 🔎 Search before asking
- [X] I have searched the PaddleOCR [Docs](https://paddlepaddle.github.io/PaddleOCR/) and found no similar bug report.
- [X] I have searched the PaddleOCR [Issues](https…
-
Thank you for the article. I have reproduced some of the results, and I plan to present this paper at the group meeting next week. However, I have some questions. After reading it, I feel a bit confus…
-
Hi,
Great work on this! Is Mistral supported? Right now I only see GPT-J and Llama 2.
Thank you!
-
CVPR 2022
#
格式
* **Paper Title**
*Author(s)*
CVPR, 2022. [[Paper]](link) [[Code]](link) [[Website]](link)
需要填充:
1)Paper Title
2) Author(s)
3) 3个“link”
4)两篇文章之间间隔一行
# agent
Meta Ag…
yyf17 updated
2 years ago
-
[ResNet Strikes back: An improved training procedure in timm](https://arxiv.org/abs/2110.00476) 논문에서 말하길 resnet도 최신 훈련 테크닉을 사용하면 최신 모델 결과에 밀리지 않는 결과를 낸다고 말함. 이를 위해 resnet에 사용한 훈련 테크닉들:
1. Data Augm…
-
At the MLPDS meeting someone brought up that in multicomponent the order of components currently matters because the learned representations are concatenated. Could we add an option to make the archit…
-
I utilized LLMCompressor to quantize our model using the FP8-dynamic recipe. The quantized model was successfully tested using the SparseAutoModelForCausalLM method.
![image](https://github.com/use…
-
Does anyone have an implementation of a bidirectional gpt model? Like bi-lstm.
-
[paper](https://arxiv.org/pdf/1707.06347)
## TL;DR
- **I read this because.. :** 배경지식 차
- **task :** RL
- **problem :** q-learning은 너무 불안정하고, trpo 는 상대적으로 복잡. data efficient하고 sclable한 arch…