tensor-parallelism Search Results

huggingface/text-generation-inference #1987

Gemma not starting with tensor parallelism

### System Info latest TGI docker image ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ### Reproduction 1. Use …

arunpatala updated 4 days ago

pytorch/torchtitan #434

Question about custom cuda operators for tensor parallelism

We are currently trying to apply torchtitan to MoE models. MoE models require using grouped_gemm https://github.com/fanshiqing/grouped_gemm. GroupedGemm ops basically follow the same rule as in Column…

vermouth1992 updated 3 days ago

Cornell-RelaxML/quip-sharp #58

Is there a way to support tensor parallelism for inference?…

Chu has merged inference code for models quantized by QuIP# into vllm(https://github.com/chu-tianxiang/vllm-gptq), but now the inference code only supports tensor_parallel_size=1. The reason is "Ha…

ChuanhongLi updated 1 week ago

shawntan/scattermoe #1

Tensor Parallelism

Hello, Thank you for the great work. I was wondering if scatter moe supported tensor parallelism? Thank you!

timmytwoteeth updated 3 months ago

OpenNMT/CTranslate2 #1708

Different results when run with tensor parallelism

Hi, I was running Flan-t5 XXL with ctranslate2 and observed completely different results when run with tensor parallelism. **To convert from HF to CT2:** ```bash ct2-transformers-converter -…

subhalingamd updated 1 month ago

michaelfeil/infinity #213

Tensor-parallelism for multi-gpu support

### Feature request Being able to split models into multiple GPUs, as with vllm/aphrodite engine for LLMs. ### Motivation It would be extremely helpful to be able to split larger models into multip…

SalomonKisters updated 1 month ago

NVIDIA/TensorRT-LLM #1286

Flan-T5 models with Tensor Parallelism

### System Info I am experimenting with TRT LLM and `flan-t5` models. My simple goal is to build engines with different configurations and tensor parallelism, then review performance. Have a DGX syst…

hademircii updated 3 weeks ago

vllm-project/vllm #5003

[Feature]: Tensor Parallelism with non divisble amount of at…

### 🚀 The feature, motivation and pitch I am trying to run a 70B model on a node with 3XA100-80Gi. 2XA100-80Gi does not contain enough VRAM to run the model, and when I try to run vLLM with tensor p…

NadavShmayo updated 3 weeks ago

NVIDIA/Fuser #2513

Avoid allocating tensors for out-of-mesh devices.

(Question; not request) This came up when I worked on https://github.com/NVIDIA/Fuser/pull/2450. FusionExecutor (as well as MultiDeviceExecutor) has to allocate a tensor even when the device is out…

wujingyue updated 2 days ago

run-llama/llama_index #14561

[Question]: embedding model support multi-gpu

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I want to use the semantic splitter from llamaindex for document segmentation. Is…

RuiqingGuo updated 1 day ago

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism