-
I'm trying to use DeepSpeed-Chat stage2 scripts to do rlhf with Qwen1.8b-chat model,I change some parts in dschat and main.py to load my model, the most different part is:
```
if 'Qwen' in model_nam…
-
Hi, I run the following Lora training script:
```
deepspeed fastchat/train/train_lora.py \
--deepspeed configs/deepspeed_zero3.json \
--lora_r 8 \
--lora_alpha 16 \
--lora_drop…
-
As we are maturing the outward-facing components of Loki, we have been contemplating a slight repository reorganisation to better encapsulate the different layers of the API and organise the re-use wi…
-
Is there a way to enable zero3-offload for LLaMA-VID?
I'm trying to integrate a LLM with higher GPU RAM usage to LLaMA-VID, which means I can't run it without offloading to RAM, even at batch_size=…
-
### Describe the issue
Issue:
Getting an error when trying to finetune the LLaVA-v1.6-34b
Command:
```
PASTE THE COMMANDS HERE.
```
#!/bin/bash
deepspeed LLaVA/llava/train/train_mem.py \
…
-
hi i am getting following error. is there some limit on protein length?
n_input: 985
opening seq.aln
cuda:0
batch_size: 10
sigma: 22.5
alpha: 0.5
seq.aln opened with object id 138143134556656…
-
Hello there!
I am trying to train this model by running this code: !sh train_CTW1500.sh in google Colab:
but I get this error in my zero epoch:
load the vgg16 weight from ./cache
Start tra…
-
We would like to limit the running time of the CLI runner, similar to how the `action_scheduler_queue_runner_time_limit` filter is working for the default runner, but we could not find a way to do thi…
-
While tuning I am getting the following error.
AssertionError: No inf checks were recorded for this optimizer.
Can anyone help me with this?
Here are my training arguments:
per_device_train_batc…
-
I used your code with AMP FP16 from pytorch 1.6. I achieved a good accuracy on validation set but showing the training accuracy is wrong. Do you have any suggestion to fix it? @xsacha @cavalleria . Th…