tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #899

Tensor shape mismatch when doing smoothquant for Yi-34B

### System Info -GPU: 4 * 3090(24G) -TensorRT-LLM version: 0.7.1, built from source released last week -TensorRT version: 9.2.0.post12.dev5 -Nvidia Driver: Driver Version: 535.54.03 CUDA Versio…

xikaluo updated 5 months ago
2
vllm-project/vllm #421

+34% higher throughput?

# +34% higher throughput? TLDR: Seeing vLLM has been really fascinating! @oleitersdorf and I investigated whether we could further accelerate vLLM by profiling its performance with GPU counters. Curr…

naed90 updated 2 months ago
46
predibase/lorax #241

[Question] SGMV support for < rank 8

Hello. [This](https://github.com/predibase/lorax/blob/309618cdb4cbc1807a6ce837a9f49062896f027b/server/lorax_server/utils/layers.py#L522) check holds when adapter's rank is at least 8 x num_shards (…

markovalexander updated 4 months ago
10
intel/intel-extension-for-transformers #873

Is there a solution to accelerate the inference of large mod…

Is there a solution to accelerate the inference of large models through multi-core? The current approach is to assign the operator's tasks to multiple cores such as GEMM and GEMV, or to split the mode…

Liu-xiandong updated 7 months ago
7
huggingface/transformers #29250

Gemma-7b is not working properly. There is a logical bug som…

Reopening issue about `gemma-7b` prediction values. This issue is still not solved: The perplexity values of gemma-2b and gemma-7b (much worse, near random) are very different. Wikitext-v2 token pe…

alisafaya updated 2 months ago
22
databricks/megablocks #40

Why not support tensor model parallel?

After looking at the code, neither moe nor dmoe support tensor-model-parallel. @tgale96

Richie-yan updated 7 months ago
7
privacy-scaling-explorations/halo2 #216

[RFC] Blackboxing MSM and FFT - Hardware Accel API

## Goals Following https://github.com/privacy-scaling-explorations/halo2curves/pull/86 MSM and FFT have been moved to halo2curves following rationales in https://github.com/privacy-scaling-explora…

mratsim updated 1 month ago
4
haotian-liu/LLaVA #1050

Multiple GPU inference is broken with LLaVA 1.6

### Describe the issue Issue: Multiple GPU inference is broken with LLaVA 1.6. Same command with model liuhaotian/llava-v1.5-13b works fine. Command: CUDA_VISIBLE_DEVICES=0,1 python -m llava.se…

hp1337 updated 3 months ago
19
ggerganov/llama.cpp #6683

Is the model's PROMPT maximum number of tokens determined by…

When I use llamacpp to reason about my Smuag-34B's model, there is no output when the input prompt has 150tokens, but the output is normal when scaled down to about 100.

17Reset updated 1 month ago
5
tracel-ai/burn #1142

Retrieving gradients in different threads panics.

Hi. If a tensor is created in the main thread, its gradients panic when the operations happen in a different thread. Example: ``` use std::{thread::sleep, time::Duration}; use burn::{backen…

giucesar updated 5 months ago
8

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism