-
Hello,I applied FA3 in the fine-tuning of the qwen2 model, using an H800 machine. The test was slower than FA2 under the same conditions.
I used FlashAttnFunc.forward in hopper/flash_attn_interface…
-
### Feature request
Seems that there is no config for DeBERTa v1-2-3 as decoder (while there are configs for BERT/RoBERTa et similia models)... This is needed in order to perform TSDAE unsupervised…
-
### 确认清单
- [X] 我已经阅读过 README.md 和 dependencies.md 文件
- [X] 我已经确认之前没有 issue 或 discussion 涉及此 BUG
- [X] 我已经确认问题发生在最新代码或稳定版本中
- [X] 我已经确认问题与 API 无关
- [X] 我已经确认问题与 WebUI 无关
- [X] 我已经确认问题与 Finetune 无关
##…
-
Thanks for your awesome work in model merging! I'm excited about the improvements you achieved compare to other merging methods. However, I saw the individually fine-tuned models still out-perform WEM…
-
I'm sorry to bother you. I want to ask the difference between the two ways to get pre-training models. I don't know if I understand correctly
**The first is in the ''Getting a pre-trained model for f…
-
In axolotl, there's a config parameter you can set:
`train_on_inputs: false`
It changes the way the loss is calculated when training a lora -> i.e. it ignores the loss on input tokens and only tra…
-
I had a question regarding LoRA support for image classification and segmentation. I understand that LoRA support is available for both as specified in the following tutorials:
https://github.com/hug…
-
Hi!
When I queue an image for the first time it takes significantly longer than subsequent requests. It seems like the issue is related to applied providers. It shows antelopev2 and buffalo_l in th…
-
Hi,
Thank you for your great work.
If you don't mind, could you provide us with minimal code or instructions to reproduce the results from the paper?
Or, the minimal script to run the code woul…
-
老师您好!
When I fine-tune codebert, graphcodebert and unixcoder on the downstream tasks, they all have the same error, which is as follows:`==================== LOADING ====================
Loaded conf…