-
一下是我的参数
LR=6e-6
DATE=0704
EPOCH=2
MAX_LEN=1024
MASTER_PORT=8888
deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \
--deepspeed deepspeed.json \
--do_train \
--do_eval \
…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Current Behavior
python main.py \
--do_train \
--train_file AdvertiseGen/train.json \
--validat…
-
@huminghao16
Could you include the scripts for evaluating a pretrained model?
(for example evaluating the large model you have included in the readme.)
I am running this command:
```
export …
-
Hi @MCC-WH. First, thanks for making your training code publicly available. I'm trying to reproduce your training results and have some questions about the provided [log file](https://github.com/MCC-W…
-
文章中说学习率最大6.25e-5,noam schedule更小,要用这么小的学习率吗?
-
### Description
I encountered an error while trying to fine-tune the llama3 model using unsloth. The error occurs during the `trainer.train()` step, and it appears to be related to a missing Python…
-
i run code for imdb got acc 91.6%, but 93.7 in paper, is there any detail i missing?
-
Hello,
I was trying to reproduce the KD results in the paper, but the results I got are somewhat worse than the reported results.
Additionally, I noticed that both MiniLLM and DistiLLM have diff…
-
命令:
deepspeed --hostfile=$hostfile fine-tune.py \
--report_to "none" \
--data_path "data/test.json" \
--model_name_or_path "/data/models/Baichuan2-7B-Base" \
--output_dir "output…
-
It would be helpful to know how fast lit-llama can be trained as it is crucial for pre-training costs. Some comparable data can be found in the link provided: [https://github.com/s-JoL/Open-Llama#和其他开…