ddp-training Search Results

1000+ results
for ddp-training

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/Tutel #204

Training with Data and Expert Parallelism

How should I prepare my code (data loaders, model, etc..) in order to train in a both Data and Expert Parallel mode? And what does it change from "auto", "model" and "data" --parallel type? In my …

santurini updated 3 weeks ago
11
pytorch/pytorch #66046

PicklingError when saving a ddp module with torch.save()

## 🐛 Bug when saving a ddp module with torch.save(), unexpected picking errors occured. my model uses encoder-decoder framework, and the encoder contains a BertModel from transformers(Huggingface)…

ccyousa updated 2 weeks ago
3
facebookresearch/detectron2 #5325

introduce torch.compile in DDP mode cause abnormal terminate…

## Instructions To Reproduce the Issue: to speedup training, I add torch.compile operation after DistributedDataParallel in detectron2/engine/defaults.py: ``` ddp = DistributedDataParallel(mo…

proshanm updated 4 months ago
2
UKPLab/sentence-transformers #2844

Multi-GPU Training with DP or DDP combined with reentrant gr…

I am trying to train on a 8xA100 instance. If I set `trainer_arguments.gradient_checkpointing` to `True`, the training hangs for a while and then dies with a `Segmentation fault (core dumped)` error. …

olivierr42 updated 1 month ago
3
huggingface/transformers #9965

[trainer] new in pytorch: `torch.optim._multi_tensor` faster…

Back in September pytorch introduced `torch.optim._multi_tensor` https://github.com/pytorch/pytorch/pull/43507 which should be much more efficient for situations with lots of small feature tensors (`t…

stas00 updated 1 month ago
8
huggingface/trl #2003

Can DPO be used to shorten the model response length prefere…

### System Info trl official DPO examples. Finetune llama3.1 with lora. params: lora_rank: 32 lora_target: all pref_beta: 0.2 pref_loss: sigmoid ### dataset dataset: train_data template:…

hengjiUSTC updated 1 month ago
2
mingyuanzhou/SiD-LSG #3

some color spots appeared on the face

Has anyone encountered the following problem? I used SiD-LSG to distill an SDXL model (made some code adaptations to the text-encoder), and some color spots appeared on the face, which were very obvio…

koking0 updated 1 week ago
2
Lightning-AI/pytorch-lightning #5243

Returning None from training_step with multi GPU DDP trainin…

## 🐛 Bug Returning None from training_step with multi GPU DDP training freezes the training without exception ### To Reproduce Starting multi-gpu training with a None-returning training_step fu…

iamkucuk updated 1 year ago
26
wandb/wandb #5695

[Q] Using WandB Sweep + SLURM + Pytorch Lightning DDP + Mult…

I'm trying to register SLURM nodes as agents for sweeps. I'm using Pytorch Lightning with DDP and multiple GPUs. Following the recommendations from Pytorch Lightning ([here](https://lightning.ai/docs/…

OFSkean updated 5 days ago
14
catcathh/UltraPixel #16

训练lora时报错

在V100上，由于只有1个GPU，修改配置文件use_ddp: False，运行 python train/train_personalized.py configs/training/lora_personalization.yaml 报错 `File "UltraPixel-main/train/train_personalized.py", line 368, in setup_opti…

fallbernana123456 updated 3 months ago
1

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for ddp-training

1000+ results
for ddp-training