-
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
…
-
https://arxiv.org/abs/1911.03001
Haifeng Liu, et al., CFS: A Distributed File System for Large Scale Container Platforms. SIGMOD‘19, June 30-July 5, 2019, Amsterdam, Netherlands.
https://github.…
-
I'm deploying Grafana Loki distributed chart and it's failing with this error:
```
msg="error running loki" err="mkdir /data: read-only file system
error creating index client
```
These error…
-
### Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
### Branch Name
main
### Commit ID
newest
### Other Environment Information
```Markdown
- Hardware par…
-
### System Info
torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2
### Information
- [ ] The official example scripts
- [ ] My own…
-
### 🐛 Describe the bug
```python
import torch
import torch.distributed.elastic.multiprocessing
@torch.distributed.elastic.multiprocessing.errors.record
def Main():
torch.distributed.…
-
The continental extension cookbook currently fails for me with the output as shown below.
When I change the solver to AMG, the model runs without a problem.
I would suggest to switch this cookbo…
-
### 🐛 Describe the bug
Flex attention on FSDP works without compile, but not with compile. The key error seems to be `ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)`…
-
![image](https://github.com/user-attachments/assets/50dd46e6-617d-4489-9a89-5d56886d39b9)
Is it normal that the `dpgo` window showed red errors before detecting loop closures?
Because I'm still …
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch…