-
I've been using `atq.INT4_AWQ_CFG` and observing a performance drop when quantizing a Llama 70B model with tensor parallelism with`atq.quantize(model, quant_cfg, forward_loop=calibrate_loop)`.
Quan…
-
Jira Link: [DB-13522](https://yugabyte.atlassian.net/browse/DB-13522)
Relevant config options in ora2pg for Table Level Parallelism
```
# This configuration directive adds multiprocess support…
-
i see the feature of this https://github.com/vllm-project/vllm/issues/7519 . i want to know when will this feature can be try ?
-
Hi, thanks for you great work! The issue I am concerned about is the deployed parallelism when compared to Lookahead. As far as I know, Lookahead currently does not supports tensor parallelism which i…
-
This blog post provides a good background / introduction: https://threedots.tech/post/go-test-parallelism/
When I ran the following command
```shell
GOMAXPROCS=1 go test -parallel 128 -p 16 -json ./.…
-
Can I run multiple environments in parallel when using Raisim Unreal as the rendering engine?
-
With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a combination of…
-
SSAs are "embarrassingly" parallel on the simulation level, meaning that N independent simulations can be run at the same time. I don't know how high N is in practice for the users of `JumpProcesses` …
-
## Problem Description
When trying to use pipeline parallelism in tensorrt-llm on 2+ NVIDIA GPUs, I encounter ```AssertionError: Expected but not provided tensors:{'transformer.vocab_embedding.weig…
-