gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

SpongebBob/Finetune-ChatGLM2-6B #19

我只有200多条多轮对话的数据，去做全参微调能有效果吗？

一下是我的参数 LR=6e-6 DATE=0704 EPOCH=2 MAX_LEN=1024 MASTER_PORT=8888 deepspeed --num_gpus=8 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --do_eval \ …

ymmbb8882ymmbb updated 1 year ago
2
THUDM/ChatGLM2-6B #380

想单卡训练，怎么设置使用哪张显卡

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior python main.py \ --do_train \ --train_file AdvertiseGen/train.json \ --validat…

liuchunyu524 updated 1 year ago
1
huminghao16/MTMSN #11

How to evaluate a pretrained model?

@huminghao16 Could you include the scripts for evaluating a pretrained model? (for example evaluating the large model you have included in the readme.) I am running this command: ``` export …

danyaljj updated 2 years ago
5
MCC-WH/CSA #11

Clarification on the provided training log file

Hi @MCC-WH. First, thanks for making your training code publicly available. I'm trying to reproduce your training results and have some questions about the provided [log file](https://github.com/MCC-W…

gu6225ha-s updated 6 months ago
4
thu-coai/CDial-GPT #97

学习率的问题?学习率最大6.25e-5

文章中说学习率最大6.25e-5，noam schedule更小，要用这么小的学习率吗？

WuDiDaBinGe updated 2 years ago
2
unslothai/unsloth #542

Compilation failure during trainer_stats = trainer.train()

### Description I encountered an error while trying to fine-tune the llama3 model using unsloth. The error occurs during the `trainer.train()` step, and it appears to be related to a missing Python…

DenizK7 updated 3 months ago
2
ChandlerGuan/Transkimmer #5

about result

i run code for imdb got acc 91.6%, but 93.7 in paper, is there any detail i missing？

oujieww updated 1 year ago
4
songmzhang/DSKD #18

Failed to reproduce KD results

Hello, I was trying to reproduce the KD results in the paper, but the results I got are somewhat worse than the reported results. Additionally, I noticed that both MiniLLM and DistiLLM have diff…

cpsu00 updated 3 days ago
4
baichuan-inc/Baichuan2 #395

Baichuan2-7B-Base微调报错 AttributeError: 'BaichuanConfig' objec…

命令： deepspeed --hostfile=$hostfile fine-tune.py \ --report_to "none" \ --data_path "data/test.json" \ --model_name_or_path "/data/models/Baichuan2-7B-Base" \ --output_dir "output…

qingchen177 updated 4 months ago
1
Lightning-AI/lit-llama #119

Inquiry about lit-llama's training speed

It would be helpful to know how fast lit-llama can be trained as it is crucial for pre-training costs. Some comparable data can be found in the link provided: [https://github.com/s-JoL/Open-Llama#和其他开…

tshu-w updated 1 year ago
6

上一页 1...88 89 90 91 92 93 94...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation