-
Hey,
I have been using Meta's Implementation of Distributed Shampoo and am seeing ~20% faster convergence of transformer based models compared to AdamW. [Simo Ryu](https://x.com/cloneofsimo/status/ā¦
-
### š Describe the bug
```python
import torch
torch.distributed.init_process_group(backend="mpi")
nccl_group = torch.distributed.new_group(backend="nccl")
```
```
[rank0]: Traceback (most rā¦
-
I do pretrain with zero3 will got errors, but lora fintune with zero3 is ok.
The error info is:
python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3375, in reduce_scatter_tensor
ā¦
zhww updated
4 months ago
-
Platforms: rocm
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_restart_pg&suite=ProcessGroupNCCLGroupTest&limit=100) and the mosā¦
-
Our customers use TigerGraph, not Neo4j. This is because TigerGraph is a distributed graph, and can support queries over multiple servers. We want Med-Graph-RAG to work on existing healthcare graphs ā¦
-
**Is your feature request related to a problem? Please describe.**
I would like to be able to use multiple GPUs to generate multiple images at a time when using the `diffusers` backend.
**Descā¦
-
### Summary of Feature
Please add an apply method that can transform a distributed array by a python lambda function.
**Description:**
**Is this a blocking issue with no known work-aroundā¦
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
### Version
2.0.2
### What's Wrong?
when create taā¦
-
Follow-up to https://github.com/JabRef/jabref/pull/12018
PDF from https://www.computer.org/csdl/magazine/ds/2002/02/o2001/13rRUEgs2Q8
Attention! The PDF must not checked in to JabRef while fixing. Oā¦
-
I have a .NET 7 Function App (Isolated Worker) that has Application Insights setup using the same instructions [documented here](https://learn.microsoft.com/en-us/azure/azure-functions/dotnet-isolatedā¦