-
### Question
Does anyone have carried out the pretraining with Mixtral 8×7B? When I run the petraining script, one problem occured like the figure shown below. I just add a llava_mixtral.py to the ll…
-
Hi, in table 20, it shows prefix FT with 2 and 4 GPUs. How are those obtained? I tried using `MODEL=facebook/opt-13b TASK=SST2 MODE=prefix LR=1e-5 NUM_GPU=8 bash finetune_fsdp.sh`, but got some errors…
-
## 环境准备 (Environmental Preparation)
```bash
# 安装ms-swift (Install ms-swift)
pip install git+https://github.com/modelscope/swift.git#egg=ms-swift[llm]
# 安装最新的transformers(Install the latest trans…
-
Today I updated the unsloth version for the first time, using 2024.8, and found a strange phenomenon. The fine-tuning results using the 2024.4 version were very good, but the fine-tuning results using…
-
I'm finding this repo to be a user friendly, extensible, memory efficient solution for training/fine-tuning models. However, when it comes to inference, there is a usability gap that could be solved b…
-
Whitepaper: https://arxiv.org/pdf/2306.02707.pdf
Will be released here: https://aka.ms/orca-lm
Summary: https://www.youtube.com/watch?v=Dt_UNg7Mchg
-
Dear @salman-h-khan ,
Thanks for your fantastic work GeoChat, I am really interested in it. And the ckpt provided by you works for me.
However, when I tried to reproduce it as a beginner of the …
-
I'm noticing with v0.3.2 my CPU is getting slaughtered. The UI revamp is worse than the previous iteration with GPU offload now hidden on "My Models" page but even with all the layers assigned to GPU …
-
### What happened?
C:\Users\ArabTech\Desktop\5\LlamaCppExe>C:/Users/ArabTech/Desktop/5\LlamaCppExe/llama-cli -m C:/Users/ArabTech/Desktop/5/phi-3.5-mini-instruct-q4_k_m.gguf -p "Who is Napoleon Bonap…
-
[LongLora](https://arxiv.org/abs/2309.12307) is "an efficient fine-tuning approach that extends the context sizes of pre-trained large language models". They propose to fine-tune a model with a sparse…