-
-
Hi,
we are trying to run two instances of cblas_dgemm in parallel. If the total number of threads is 16, we would like each instance to run using 8 threads. Currently, we are using a structure like …
-
Hello. I am curious if it would be possible to implement Megatron-style sequence parallelism in the repository.
Sequence parallelism is important for reducing activation memory, which is difficult to…
-
Firstly, thank you for a great repository.
I have a question regarding parallelism using whisper-live vs. faster-whisper on a single GPU. In this faster-whisper [issue](https://github.com/SYSTRAN/f…
-
I have a quantized model that is too large to fit in one GPU, and does fit in 2 GPUs. I have 4 GPUs, so the most efficient configuration is to replicate the model and use data parallel on 2 processes …
-
I am trying to set up a dynamic kernel wherein a KA kernel launches a CUDA kernel. The final objective would be to have dynamic parallelism using only kernel abstractions. This is a MWE showing the c…
-
In prod, we regularly see the search queue becoming "blocked" by long-running mismatch searches.
We have various dev tickets around this: see #1282
However, we could potentially ameliorate the issue…
-
Ben, great work here, appreciate the investment of your time.
I've seeing what appears to be client-server serviceability issues when stress testing using this package as a way to process separate …
-
https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html
-
### 🚀 The feature, motivation and pitch
### Motivation
SPMD sharding in pytorch/XLA offers model parallelism by sharding tensors within an operator. However, we need a mechanism to integrate thi…