batch-scheduler Search Results

1000+ results
for batch-scheduler

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

fe1ixxu/ALMA #65

CPO 复现，模型重复输出

你好，我按照脚本里默认的超参数（learning rate），以及论文提到各参数配置、偏好数据，在ALMA-7B-Lora上做CPO，但是训出来的模型输出大量重复前文甚至不翻译的情况，如下图（zh->en,raw_res 是没用utils里的clean函数的结果），请问是哪里没设好超参吗？谢谢你。 ![image](https://github.com/user-attachments/asse…

XDeepAzure updated 3 weeks ago
1
ContextualAI/gritlm #60

Train llama 3.1 with GRIT

I'm now trying to train llama3.1 with GRIT pipeline. At first I directly change ``--model_name_or_path`` and run the training code (the training script I used is as follows) ``` #!/bin/bash #SB…

ThisisXXZ updated 3 weeks ago
5
cvlab-columbia/zero123 #88

Training on a custom dataset

I appreciate your great work in zero123. I want to retrain zero123 on medical data. My dataset contains about 700 samples, using the same data processing method as in the paper. Each sample has 12 …

ys830 updated 1 month ago
10
snakemake/snakemake #301

Job groups and SLURM job arrays support

**Is your feature request related to a problem? Please describe.** Hello. I am a developer of Bitextor (https://github.com/bitextor/bitextor), which is based on Snakemake, and we are having issues ru…

lpla updated 1 month ago
28
dask/distributed #6604

Worker State should be exclusively modified through batched …

This is a high level epic. The Worker State Machine (`distributed/worker_state_machine.py`) can be exclusively updated through the `Worker.handle_stimulus` handler. _Most_ calls that change the wor…

crusaderky updated 2 years ago
1
haotian-liu/LLaVA #194

[Usage] Using Deepspeed pretrainning errors

### When did you clone our code? I cloned the code base after 5/1/23 ### Describe the issue Issue: When I use deepspeed zero3 to pretrainning LLaVA-13B on 4 * A100（40G），I got an error shows below. …

aprilehannibal updated 2 weeks ago
16
instantX-research/Regional-Prompting-FLUX #9

Can you support cpu offload ?

Thank you for provide this project ,as the title say, i find this repo can not support cpu offload like this issue https://github.com/huggingface/diffusers/issues/2531 Can you consider add this supp…

svjack updated 2 weeks ago
2
pytorch/pytorch #128015

Best approach for dynamically changing batch sizes on iterab…

### 🚀 The feature, motivation and pitch The DeepSeek V2 paper proposed a training methodology where both the LR and the batch size were on a scheduler. Exact description is below, however essentia…

muellerzr updated 5 months ago
1
InternLM/xtuner #938

When seq_parallel_world_size is set to a value greater than …

I'm working on the 32k long text SFT for Qwen2 72b. When I set **seq_parallel_world_size** to greater than one and **use_varlen_attn to true**, an error occurs. After checking, the error message is a…

Fovercon updated 1 month ago
1
flux-framework/flux-sched #1015

Only next job in queue has a sched.t_estimate

The Fluxion scheduler provides a `t_estimate` job annotation, which `flux jobs` displays by default in the generic `INFO` column for jobs in the SCHED state. This is very useful, but typically I have …

grondo updated 11 months ago
3

上一页 1...27 28 29 30 31 32 33...100 下一页

1000+ results for batch-scheduler

1000+ results
for batch-scheduler