tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

turboderp/exllamav2 #598

Tensor parallelism issues

A couple issues with the new tensor parallelism implementation! 1) Tensor Parallelism doesn't appear to respect a lack of flash attention, even via the -nfa flag. It also doesn't document flash att…

dirkson updated 1 month ago
5
S-LoRA/S-LoRA #44

Tensor Parallelism not working for API server

Hi I am trying to run the API server with tensor parallelism (across either 2 or 4 GPUs). I am trying to run it with the following command: ```bash python -m slora.server.api_server --max_total…

kjain1810 updated 6 days ago
1
ggerganov/llama.cpp #9086

Feature Request: Tensor Parallelism support

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…

ClarkChin08 updated 1 week ago
3
turboderp/exllamav2 #603

Does NVLink improve tensor parallelism?

With 2x3090 - does the recently added tensor parallelism use NVLink in any manner? Thanks!

bryanhpchiang updated 1 month ago
1
triton-inference-server/pytriton #79

[Question] Tensor parallelism for tensorrt_llm

**Is your feature request related to a problem? Please describe.** I am aware that PyTriton already have an example for using PyTriton with tensorrt_llm. But I noticed that the example only support s…

JoeLiu996 updated 2 weeks ago
1
casys-kaist/NeuPIMs #4

Question about the tensor parallism

Hello, I am encountering an issue related to my understanding of tensor parallelism in the PIM (Processing In Memory) model. Specifically, I noticed a discrepancy in the Key-Value (KV) cache all…

shenjiangqiu updated 2 weeks ago
1
vllm-project/vllm #7958

[Usage]: How to specify certain GPUs for Tensor Parallelism …

### Your current environment I have a server with only one NVLink connection, so I need to use pipeline parallelism and tensor parallelism within a single node to improve its performance. I would lik…

henry-y updated 1 month ago
1
EricLBuehler/mistral.rs #675

Distributed inference and tensor parallelism plans

With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a combination of…

EricLBuehler updated 1 month ago
2
Lightning-AI/litgpt #1663

Tensor parallelism generates non-sensical outputs

### Bug description For some reason, the tensor parallel implementation generates non-sensical outputs ``` ⚡ python-api-tensor-parallel ~/litgpt litgpt generate_tp checkpoints/microsoft/phi-2 …

rasbt updated 1 month ago
1
PygmalionAI/aphrodite-engine #767

[Feature]: tensor parallelism support for bnb quantization (…

### 🚀 The feature, motivation and pitch I don't know if it's feasible or worthwhile to merge [this](https://github.com/IBM/vllm/tree/9855b99502c7537db5ef018129e603650800ac46), as maybe the trees ar…

BlairSadewitz updated 4 days ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism