tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bd-iaas-us/vllm #12

[Feature]: TP support in QLoRA of VLLM

### 🚀 The feature, motivation and pitch Support tensor-parallelism in QLoRA on vllm. ### Alternatives _No response_ ### Additional context _No response_

chenqianfzh updated 2 weeks ago
2
vllm-project/vllm #5792

[Performance]: Long AllReduce wait time on 1 device with ten…

### Proposal to improve performance Propose synchronizing the broadcast of tensor_dict at the beginning of each decoding step or block the process after broadcast. ### Report of performance regr…

wenscarl updated 2 months ago
2
vllm-project/vllm #7554

[Bug]: CTRL+C Not Killing Process with distributed_executor…

### Your current environment - `vllm==0.5.3.post1` - `python=3.9` ### 🐛 Describe the bug When using distributed_executor_backend=mp with VLLM version `vllm==0.5.3.post1,` the process doe…

hahmad2008 updated 1 month ago
2
huggingface/optimum-habana #1242

Data parallelism or tensor parallelism? How can i know tha…

### System Info ```shell in the run_generation.py (for text generation) How can i know that what kind of parallelism it is like data or tensor? and is there a chance to shift in between these two? ``…

venkycreator updated 1 month ago
1
huggingface/transformers #10321

[Tensor Parallelism] Megatron-LM to transformers

# 🚀 Feature request Splitting the discussion that started here: https://github.com/huggingface/transformers/pull/10301#issuecomment-782917393 to add the potential future feature of transformers and…

stas00 updated 8 months ago
9
pytorch-labs/gpt-fast #167

Tensor Parallel Inside notebook

Hi, Im trying to get an example working with Ray on Databricks. Essentially having multiple replicas of the model. Is it possible to load a model with tensor parallelism inside a notebook? Thank…

nivibilla updated 2 weeks ago
3
NVIDIA/TensorRT-LLM #472

Does ATQ work with tensor parallelism?

I've been using `atq.INT4_AWQ_CFG` and observing a performance drop when quantizing a Llama 70B model with tensor parallelism with`atq.quantize(model, quant_cfg, forward_loop=calibrate_loop)`. Quan…

theophilegervet updated 9 months ago
1
QwenLM/Qwen2.5 #954

[Bug]: Qwen2 moe out of memory

### Model Series Qwen2 ### What are the models used? Qwen2-57B-A14B ### What is the scenario where the problem happened? train with transformers ### Is this a known issue? - [X] I have followed…

FL77N updated 1 week ago
2
NVIDIA/TransformerEngine #699

MPI Dependency for Computation-Communication Overlapping in …

Hi, I've noticed that you have implemented that allows for the overlapping of computation and communication in tensor parallel operations. This is a significant enhancement that has the potential t…

zhipeng93 updated 4 months ago
1
CoinCheung/gdGPT #18

Any plan to incorporate tensor parallelism or zero data para…

Would it be possible in this framework that the pipeline is incorporated to tensor parallelism or zero data parallelism?

GeneZC updated 1 year ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism