-
Hello,
I saved all the files - config.json, metadata.list as UTF-8 without BOM format, while when running the training bash
bash train.sh ./data/example/config.json 1
it always report the
…
-
Instead of using our own task pool, we should leverage Dask distributed, as this will allow us to better consume resources from existing clusters.
-
### 🐛 Describe the bug
code:
```python
from torchtext.vocab import build_vocab_from_iterator
import torchtext
from typing import Iterable, List
import random
import os
import torch
from tqdm …
-
## 🐛 Bug
There is an error when training falcon-7b model with thunder_cudnn.
### To Reproduce
Start a docker container:
```
mkdir -p output
docker run --pull=always --gpus all --ipc=host -…
-
hello, I wan't to ask how to train mae pretrain in Multi-node Multi-gpu distributed using network ?
Can you provide a script?
-
Dear Developers at LMFlow:
I have been using LMFlow for a long time and the experience is great !
But recently, after cloning the **latest** LMFlow and use it to **Fine-Tune** my model, I encoun…
-
### Issue Type
Documentation Feature Request
### Source
source
### Keras Version
Keras 2.13.1
### Custom Code
Yes
### OS Platform and Distribution
Linux Ubuntu 22.04
### Python version
3.9
…
-
Thank you for your excellent work! I have some trouble with training:
I tried to install slurm for cluster job scheduling, but unfortunately many attempts failed. So, what we want to know is if ther…
-
GraphScope leverages the distributed GNN training framework, graphlearn-for-pytorch ([GLTorch](https://github.com/alibaba/graphlearn-for-pytorch)), to facilitate large-scale distributed GNN training. …
-
## Description
As described in [PyTorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/1.4.9/advanced/multi_gpu.html), the logs need to be synchronised using `sync_dist=True`.
…