gradient-accumulation Search Results

1000+ results
for gradient-accumulation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

timoschick/pet #103

ZeroDivisionError when reduction is set to 'wmean' while tra…

I ran the code with: ``` python cli.py \ --method ipet \ --data_dir ../dataset/data \ --model_type bert \ --model_name_or_path bert-base-cased \ --task_name my-task \ --output_dir…

xruifan updated 1 year ago
1
jax-ml/jax #12494

Inverse Accumulation Mode

Inverted Jacobian products are useful in a variety of algorithms such as the efficient implementation of [Newton's method with regularization](https://math.stackexchange.com/questions/3287587/extracti…

NeilGirdhar updated 1 week ago
10
ShihaoZhaoZSH/LaVi-Bridge #3

Training and Fine-tuning hardware requirements

Exciting paper! Thank you for doing this research and publishing it. Do you want to share some insight on what type of compute is required for training LaVi-Bridge? Since you've used around 2M t…

kubernetes-bad updated 7 months ago
1
tatsu-lab/stanford_alpaca #86

LLaMA-13B (HF) Fails with OOM on a dual A100-80GB

LLaMA-13B (HF) Fails with OOM on a dual A100-80GB. For those who managed to run alpaca against the 13b model, what specs and torchun setting did you use? `torchrun --nproc_per_node=2 --master_po…

jtang613 updated 1 year ago
1
TheLastBen/fast-stable-diffusion #1435

Received this error this morning with my training.

Training the UNet... '########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######::: ... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##:: ::: ##:::: ##:::: ##::'#…

DarkAlchy updated 1 year ago
5
THUDM/ChatGLM2-6B #380

想单卡训练，怎么设置使用哪张显卡

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior python main.py \ --do_train \ --train_file AdvertiseGen/train.json \ --validat…

liuchunyu524 updated 1 year ago
1
pytorch/pytorch #31776

Documentation for `scatter` incorrectly states that index va…

## 📚 Documentation Documentation for `scatter_` and `scatter_add_` incorrectly states that "Moreover, as for gather(), the values of index must be between 0 and self.size(dim) - 1 inclusive, and a…

ngimel updated 4 years ago
6
bmaltais/kohya_ss #2852

subprocess.CalledProcessError: Command '['C:\\Users\\18078\\…

AlexiosDyral updated 1 month ago
6
unslothai/unsloth #1241

how to only do lora on the lm_head?

I want to do only training (lora is fine) for the head of the network, how do I do that? I get this error: ```bash (beyond_scale_2_unsloth) brando9@ampere1~/beyond-scale-2-alignment-coeff $ python /…

brando90 updated 1 day ago
3
dvlab-research/LLaMA-VID #60

Zero-3 offload support

Is there a way to enable zero3-offload for LLaMA-VID? I'm trying to integrate a LLM with higher GPU RAM usage to LLaMA-VID, which means I can't run it without offloading to RAM, even at batch_size=…

XenonLamb updated 7 months ago
5

上一页 1...85 86 87 88 89 90 91...100 下一页

1000+ results for gradient-accumulation

1000+ results
for gradient-accumulation