distributed-shuffle Search Results

1000+ results
for distributed-shuffle

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Lightning-AI/pytorch-lightning #10170

Distributed mode overwrites the user's choice for dataloader…

## 🐛 Bug I ordered my training data in a specific manner and passed it to the DataLoader with `shuffle=False` (I use `reload_dataloaders_every_n_epochs=1` to control it every epoch). Then, I found ou…

eladsegal updated 1 year ago
8
dask/distributed #7507

Efficient scalable shuffle - P2P shuffle extension

# Motivation Shuffles are an integral part of many distributed data manipulation algorithms. Common DataFrame operations relying on shuffling include `sort`, `merge`, `set_index`, or various groupb…

fjetter updated 1 year ago
33
mosaicml/streaming #725

Replication changes sample order

## Environment - mosaicml-streaming==0.7.5 ## To reproduce Steps to reproduce the behavior: 1. Use `StreamingDataset` in distributed training with the same seed and set `replication` either …

CodeCreator updated 4 months ago
3
ray-project/ray #20499

[core] Scale shuffle to 200+ nodes

Shuffle is a key workload for stressing Ray core's distributed dataplane. For large datasets, it requires all-to-all communication and spilling to disk. Thus, shuffle stresses the object transfer and …

stephanie-wang updated 3 years ago
1
Peterande/D-FINE #75

Encounter an error at the beginning of epoch 72

**When I am trainning on the custom datasets, encounters an error at the beginning of epoch 72.** nohup: ignoring input W1111 14:07:33.188000 2366719 site-packages/torch/distributed/run.py:793] …

wusheng816 updated 5 days ago
1
pytorch/text #1727

Regarding adding shuffling and sharding datapipes to in-buil…

## 🚀 Feature **Motivation** * To avoid pitfall with shuffling and sharding of datapipes in distributed training environments * To ensure consistent experience of TorchData based datasets ac…

parmeet updated 2 years ago
1
pytroll/trollflow2 #209

log settings in logging_on() not inherited by dask distribut…

**Describe the bug** The log settings defined by `logging_on()`, and therefore by any trollflow2 process, are not inherited by tasks scheduled using dask.distributed when called inside an `if __nam…

gerritholl updated 4 months ago
1
mistralai/mistral-finetune #71

Fail to finetune with several GPU

Hello, We successfully fine-tuned the Mistral7b_v0.3 Instruct model using a single GPU, but we encountered issues when trying to utilize multiple GPUs. The successful fine-tuning with one GPU (A…

banalg updated 3 months ago
3
dask/dask #6316

Infer shuffle='disk' for the DataFrame.set_index method when…

The documentation for the shuffle parameter in the dask.DataFrame.set_index method says: - "Either 'disk' for single-node operation or 'tasks' for distributed operation. Will be inferred by your curr…

Adam-D-Lewis updated 4 years ago
10
ROCm/TransformerEngine #78

[FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dum…

### Problem Description On Llama3 70B Proxy Model, the training stalls & gpucore dumps. The gpucore dumps are 41GByte per GPU thus i am unable to send it. Probably easier for yall to reprod this er…

OrenLeung updated 1 week ago
24

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for distributed-shuffle

1000+ results
for distributed-shuffle