ddp Search Results - Githubissues

BorealisAI/flora-opt #5

Support 7b model fine-tuning with DDP

Hi, first of all, thank you for this cool work! It's impressive and I appreciate the effort you've put into it. I have a question about using Flora with DDP. Have you tried Flora to train 7B with DDP…

seongjunyun updated 4 days ago

RenderHeads/UnityPlugin-AVProVideo #1943

Vision OS, DD+JOC Audio Play Issue

### Please DO NOT LINK / ATTACH YOUR PROJECT FILES HERE **Describe the issue** A clear and concise description of what the issue is. **Your Setup (please complete the following information):** …

KimMyongKil updated 3 days ago

maybeLx/MVSFormerPlusPlus #7

error when using DDP mode

Hi, I'm having some issues with the training for blendedmvs using DDP mode. `Traceback (most recent call last): File "train.py", line 265, in mp.spawn(main, nprocs=args.world_size, args=(…

burui11087 updated 3 weeks ago

UKPLab/sentence-transformers #2732

Batch size for CachedMultipleNegativesRankingLoss with DDP

What's the "best-practice" for configuring CachedMultipleNegativesRankingLoss when used for DDP. Say for example I have 3000 unique `positive` labels in my dataset, and I'm training using DDP on a sin…

david-waterworth updated 1 month ago

TencentARC/SEED-Story #6

Failed training with DDP on single-node 8 GPU

I always failed with the line https://github.com/TencentARC/SEED-Story/blob/c1c08a09bfbfdfd3b1f568fc4420c6ccf83f2db5/src/train/train_clm_sft.py#L204. Could you please build a new environment and train…

lambert-x updated 4 days ago

Netatalk/netatalk #220

Reintroduce AppleTalk / DDP support

To investigate the feasibility and break down the requirements for reintroducing the AppleTalk into the main Netatalk development branch. # Background AppleTalk / DDP and associated daemons and to…

rdmark updated 4 days ago

mbzuai-oryx/VideoGPT-plus #9

eval/vcgbench/inference/run_ddp_inference.sh

[h264 @ 0x16543c00] Missing reference picture, default is 65562 [h264 @ 0x16543c00] mmco: unref short failure [h264 @ 0x16543c00] mmco: unref short failure [h264 @ 0x16543c00] Missing reference pic…

rixejzvdl649 updated 3 weeks ago

iLovEing/notebook #32

pytorch分布式训练 - DDP

# pytorch分布式训练 ### 并行策略 1. 分布式训练根据并行策略的不同，可以分为模型并行和数据并行。 - 模型并行：模型并行主要应用于模型相比显存来说更大，一块 GPU 无法加载的场景，通过把模型切割为几个部分，分别加载到不同的 GPU 上，来进行训练 - 数据并行：这个是日常会应用的比较多的情况。即每个 GPU 复制一份模型，将一批样本分为多份分发到各个GPU模型并行计算…

iLovEing updated 1 month ago

Lightning-AI/litdata #248

Resuming Training w/ Streaming Dataset on DDP with Multiple …

## 🐛 Bug We trained a model for several epochs on multiple nodes, and we wanted to continue training with PyTorch Lightning and LitData. ✅ When we resume training on a single device, resumption …

schopra8 updated 1 hour ago

RocketChat/EmbeddedChat #596

[BUG]: Failed to execute 'send' on 'WebSocket': Still in CON…

### Description If you try to integrate this `EmbeddedChat` package into a React application created using `create-react-app`, as soon as you log in, you will encounter a warning or error similar t…

Spiral-Memory updated 5 days ago

1000+ results for ddp

1000+ results
for ddp