-
### Your current environment
Collecting environment information...
PyTorch version: 2.3.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubunt…
-
Currently our description of massive parallelism comes from sharding but that is within a single core.
Our implementation of multi-device sharding is approximated using a slice op on host followed by…
-
At Facebook we are building a data reading framework for PyTorch which can efficiently read from data stores like Hive, MySQL, our internal blob store and any other tabular data sources. The framework…
-
Is there a reason this isn't present in linear-base? I know you can't make & (co)datatypes directly in Haskell and have to encode them, but it seems like it's worth having. I think I read somewhere th…
-
Hi,
I was wondering, is `torchtitan` and/or `DTensor` capable of model parallel training of convolutional neural network layers? Pretty much, we want to train a GAN on very large 2D images (eventua…
-
As per the discussion on https://groups.google.com/g/sage-devel/c/R3r3G_Qrllo, opening this ticket to parallelize Boruvka's algorithm.
CC: @kliem
Component: **graph theory**
Author: **Adarsh Kis…
-
I am debugging a data-parallel forward mismatch when using `megablocks` (DP and non-DP give different forward results). During debugging, I tried to reproduce such difference minimally, and found that…
-
I did some extensive investigation, testing and benchmarking, and determined that the following is needed to speedup inference for the Bigcode models (and most of text-gen-inference models:
1. **Use …
-
### Your current environment
```text
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12…
-
### 🐛 Describe the bug
Hello,
I am running llama3-70b and mixtral with VLLM on a bunch of different kinds of machines. I encountered wildly different quality performance on A10 GPUs vs A100/H…