-
Hi, I encountered an issue while using Triton for LoRa finetuning of mpt-storywriter-4bit. The problem occurs when the program reaches the following line of code:
```python
self.fn.run(*args, num_…
-
If pad tokens are used, and `model.eval(); model.train()` is called, Unsloth backward pass is undifferentiable, resulting in `nan`.
Reproduction script (expand):
```
import torch
from transf…
lapp0 updated
3 months ago
-
Hello,
Along the issue here https://github.com/evo-design/evo/issues/11 which discusses finetuning codes for Evo, I am specifically looking for information on which frameworks could be used to opti…
-
模型导出:model = AutoModel.from_pretrained(xxx)
model = llm.from_hf(model, tokenizer, dtype = "float16")
model.save(xxx)
模型载入:llm.model(xxx)
报…
-
Running the tasks with `BAAI/bge-visualized-base-base/m3` and getting errors like below
```
ERROR:mteb.evaluation.MTEB:Error while evaluating InfoSeekIT2TRetrieval: The size of tensor a (516) must m…
-
Hi authors,
In the SFTTrainer, we set "seed = 3407". But I find the training procedure is still random. the performance of test dataset and the change of loss are different under same configs.
…
-
# LoRA: Low-Rank Adaptation of Large Language Models
[https://real-science.vercel.app/lora-low-rank-adaption-of-large-language-models](https://real-science.vercel.app/lora-low-rank-adaption-of-larg…
-
I'm trying to fine-tune BGE-M3 based on the README here: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune
I originally started with the latest transformers version a few week…
-
## PaddleMIX 2.0版本正式发布
https://github.com/PaddlePaddle/PaddleMIX/tree/v2.0.0
* 多模态理解:新增LLaVA系列,Qwen-VL等;新增Auto模块统一SFT训练流程;新增mixtoken训练策略,SFT吞吐量提升5.6倍。
* 多模态生成:发布[PPDiffusers 0.24.1](./ppdiffusers…
-
https://github.com/huggingface/peft/issues/286
This issue, which presents a problem with how Alpaca Lora saved models, hopefully, is what is causing the problems. I also see the final adapter.bin f…