distributed-datasets Search Results

1000+ results
for distributed-datasets

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

siyuanliii/masa #39

AssertionError: the length of the concatenated dataset must …

I downloaded sa_000000.tar from SA-1b to try to train masa. When I followed the tutorial to complete the dataset format conversion and train it, I got this error: loading annotations into memory... …

mrmusad updated 1 week ago
1
NVIDIA/Megatron-LM #1125

[QUESTION] tensor_parallel.broadcast_data and train_valid_te…

In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus. However, if i activate the option `train_valid_test_datasets_provider.is_distributed = True` wh…

KookHoiKim updated 3 days ago
1
b-cubed-eu/gcube #102

better zenodo.json

- Add more elaborate description: **gcube** is an R package that provides a simulation framework for biodiversity data cubes. This can start from simulating multiple species distributed in a landsc…

wlangera updated 4 days ago
1
huggingface/datasets #7192

Add repeat() for iterable datasets

### Feature request It would be useful to be able to straightforwardly repeat iterable datasets indefinitely, to provide complete control over starting and ending of iteration to the user. An It…

alex-hh updated 1 month ago
2
pytorch/torchtune #1932

Pretraining Cuda Out of Memory Issue

I have a device containing 4 Nvidia L40 GPUs. I am trying to use the full_finetune_distributed llama3_1/8B_full recipe. My configuration for dataset in the config file is given below: dataset: _c…

muniefht updated 1 week ago
6
instructlab/instructlab #2544

Model training fails with 'OverflowError: int too big to con…

**Describe the bug** I encountered the error "OverflowError: int too big to convert" when trying to run `ilab model train` on my local system. **To Reproduce** Steps to reproduce the behavior: 1…

shaneboulden updated 1 week ago
1
NVIDIA/NeMo-Curator #295

Unmanaged memory is high and frozen execution

**Describe the bug** The warning ``` 2024-10-11 00:04:31,529 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released t…

pappagari updated 3 weeks ago
3
huggingface/datasets #6594

IterableDataset sharding logic needs improvement

### Describe the bug The sharding of IterableDatasets with respect to distributed and dataloader worker processes appears problematic with significant performance traps and inconsistencies wrt to d…

rwightman updated 3 weeks ago
1
pytorch/torchtune #1178

load_dataset fails on distributed recipes for datasets with …

# Description When you load a dataset from HF with remote code, the load_dataset function prompts the user for permission to run remote code. This prompt only happens the first time the user downlo…

pbontrager updated 3 months ago
2
pytorch/torchtune #1051

Request for PackedDataset support for mmap

Hi, thank you for the great work! The current implementation of the `PackedDataset` class only supports in-memory map-style datasets. When working with large datasets, the in-memory limitation can…

parthsarthi03 updated 1 week ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for distributed-datasets

1000+ results
for distributed-datasets