distributed-datasets Search Results

1000+ results
for distributed-datasets

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

FlagOpen/FlagEmbedding #955

关于BGE-M3在微调时报：pyarrow.lib.ArrowInvalid: offset overflow whil…

**场景**：使用BGE-M3进行finetune，数据文件.jsonl 含有158000行记录，每行记录一个query，pos列表的长度为1，neg列表的长度为15。 **异常报错**： WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS envi…

MarcusEddie updated 4 months ago
1
mapme-initiative/mapme.biodiversity #186

Mirroring datasets on Source Cooperative?

Hi friends, have you seen the recently launched beta for [Source Cooperative](https://source.coop/) from the non-profit group RadiantEarth? I believe it would be possible to mirror large public-do…

cboettig updated 4 months ago
9
rapidsai/cuml #4406

[BUG] Training cuML single GPU models on dask dataframe obje…

**Describe the bug** With [PR](https://github.com/rapidsai/cuml/pull/4300) we enabed training single GPU cuML models using Dask DataFrames and Series but we use `compute` there which brings data t…

VibhuJawa updated 2 years ago
3
huggingface/optimum-neuron #681

Enable use of IterableDataset when training with DDP

### Feature request Enable use of IterableDataset when training with NeuronTrainer and DDP. Or is there a design limitation that prevents this? I can't share the project code, but see below anot…

syl-taylor-aws updated 4 hours ago
2
huggingface/transformers #34530

Inference with FSDP during training affects checkpoints

### System Info Output from `transformers-cli env`: ``` - `transformers` version: 4.45.2 - Platform: Linux-6.1.0-21-cloud-amd64-x86_64-with-glibc2.36 - Python version: 3.12.5 - Huggingfa…

pandrei7 updated 4 days ago
5
xyfJASON/ctrlora #9

Training test!

I prepared 1000 images and ran a training test. I set the max steps to 1000 and the training finished in 6 minutes, but the result is very cool! Do you have any tips for running training? ![ima…

toyxyz updated 2 weeks ago
16
deepchem/deepchem #1972

Adding DaskDataset

At present, `DiskDataset` is our workhorse class for large datasets. This class is pretty nicely optimized with a cache and everything, and I've been able to use it on 50GB datasets without too much t…

rbharath updated 4 years ago
1
OpenBeta/climbing-data #14

Clarify license/Remove unauthorized content

This repo appears to completely consist of data scraped from MountainProject user contributions. Putting an open source license on it *after* scraping without permission doesn't make it open source.

flynn-d updated 2 weeks ago
6
thunlp/OpenKE #410

使用openke2.0中的train_rotate_FB15K237_dist.py进行分布式训练时报错

你好，我在使用openke2.0中的train_rotate_FB15K237_dist.py时出现以下报错，请问有什么解决办法吗？非常希望得到帮助。 Input Files Path : ./benchmarks/data-390/ The toolkit is importing datasets. The total of relations is 28. The total of …

pipiyapi updated 4 months ago
1
wenh06/fl-sim #5

For non-iid secenario?

Thanks for the very helpful repo. I would like to know if there any interface for non-iid data setting for each client? I don't find it in the src code. Thanks.

CaesarGo updated 4 months ago
1

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for distributed-datasets

1000+ results
for distributed-datasets