-
To train webface 20M subjects https://www.face-benchmark.org/, we should use distributed fc training. Do you have schedule to implement it
https://github.com/Tencent/TFace/tree/master/tasks/distfc
-
GraphScope leverages the distributed GNN training framework, graphlearn-for-pytorch ([GLTorch](https://github.com/alibaba/graphlearn-for-pytorch)), to facilitate large-scale distributed GNN training. …
-
Hi, is this project able to use distributed training in multiple nodes?
-
[TensorFlow v0.8](http://www.theregister.co.uk/2016/04/14/tensorflow_08_google_release/) offers a [way to train in parallel](http://googleresearch.blogspot.com/2016/04/announcing-tensorflow-08-now-wit…
-
Thank you for your excellent work! I have some trouble with training:
I tried to install slurm for cluster job scheduling, but unfortunately many attempts failed. So, what we want to know is if ther…
-
I run the following train.sh on Mistral-7b:
```
accelerate launch finetune.py \
--output-dir output/yarn-mistral-7b-64k \
--model mistralai/Mistral-7B-v0.1 \
--architecture mistral \
…
-
### Bug description
I'm working on a slurm cluster with 8 AMD MI100 GPUs distributed in 2 nodes, with 4 GPUs in each node. I follow the instructions (https://lightning.ai/docs/pytorch/stable/clouds…
-
Hello
When I run the code with two GPUs, I get the following error
```
Traceback (most recent call last):
File "/home/huntsman/repos/analysing_pii_leakage/examples/fine_tune.py", line 82, in
…
-
Hi, I try the distributed training with 2 machines. There are 4 GPUs in each machine.
in the master machine, I run:
python -u tools/run_net.py \
--cfg configs/Kinetics/SLOWFAST_8x8_R50.yaml \
--…
-
**System information**
- Have I written custom code: YES
- OS Platform and Distribution: CentOS 7.3
- TensorFlow installed from: pip
- TensorFlow version: 2.3.0
- Python version:3.7.7
- CPU ON…