distributed-datasets Search Results

1000+ results
for distributed-datasets

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

CrawlScript/tf_geometric #47

[Issue] Distributed training OOM with large datasets

Hi, I'm using tf_geometric in a distributed fashion for a node classification problem. My BatchGraph contains thousands of Graph's and spans over several hundreds of gigabytes. When using the GCN …

cymarechal-devoteam updated 1 year ago
1
NVIDIA/Megatron-LM #1134

[BUG] 'NoneType' object has no attribute 'shape' error raise…

Hi, It seems that the same code is **working fine with when the Megatron-LM that I git-cloned in April. With the latest Megatron-LM, I've got the following error raised with the pretrain_gpt.py code. …

hwang2006 updated 2 hours ago
6
pytorch/ignite #1242

Handling empty datasets in distributed metric computation

## 🐛 Bug description Metric computation does not work properly in distributed settings when some processes do not handle any batch in the dataset. It becomes a problem when small validation or test…

linhr updated 3 years ago
8
pytorch/text #669

How to use datasets for distributed training?

## ❓ Questions and Help **Description** I built a dataset from my corpus, and use each line as an Example. It works fine at first until I try to use it for distributed training. It seems t…

styxjedi updated 4 years ago
4
ViTAE-Transformer/SAMRS #34

分布式训练命令的问题

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 \ --nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py \ --backbone 'resnet5…

1835969208 updated 1 month ago
2
ViTAE-Transformer/SAMRS #33

分布式训练运行指令问题

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 \ --nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py \ --backbone 'resnet5…

1835969208 updated 1 month ago
1
Eventual-Inc/Daft #2841

Load from Hugging Face ?

Hi ! I'm Quentin from Hugging Face :) Congrats on this project, this has the potential to help the community so much ! Especially with large scale and multimodal datasets. I was wondering if you…

lhoestq updated 2 days ago
7
taokz/BiomedGPT #32

I encountered a problem when reproducing the VQA task

After I deployed the environment as required, I encountered a problem when reproducing the VQA task. The following error occurred when running the evaluate_vqa_rad_beam_scale.sh file. I hope to get yo…

huishao007 updated 2 weeks ago
2
megvii-research/NAFNet #134

train.py: error: unrecognized arguments: --local-rank=0

Encounter this error when trying to train GoPro datasets: `python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 train.py -opt options/train/GoPro/NAFNet-width32.yml --launcher pyt…

davidvct updated 3 weeks ago
4
facebookresearch/ssl-data-curation #8

Running on a webdataset

Hi, Is there a simple way to run this code on a webdataset? Thanks!

nicolas-dufour updated 3 weeks ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for distributed-datasets

1000+ results
for distributed-datasets