tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeedExamples #596

Do we have any plans on supporting pipeline parallel?

Hi, because recently I'd like to fine-tune bloom-7b1 by ds-chat using full model parameters, while I find it does not have any supports on pipeline parallel. Do we have any plans on supporting pipeli…

LSX-Sneakerprogrammer updated 1 year ago
2
microsoft/DeepSpeed #4704

[REQUEST]Support for multiple node inference?

Hi, I want to run one LLM model using multiple machines. On one node, I want to use tensor parallel to speedup. Within multiple nodes, I want to use pipeline parallel. Is this supported? If s…

sleepwalker2017 updated 9 months ago
9
Dao-AILab/flash-attention #1207

Pipelining GmemCopy on kHeadDim

Hello! I'm currently checking flash attention v2 and noticed that when copying from global memory to shared memory, the entire HeadDim (the K dimension in MNK tiling) needs to be copied to shared m…

phantaurus updated 1 month ago
6
huggingface/optimum-neuron #721

training loss while fine-tuning llama 3.1 with lora is very …

### System Info ```shell using Huggingface AMI from AWS marketplace with Ubuntu 22.04 optimum-neuron 0.0.25 transformers 4.45.2 peft 0.13.0 trl 0.11.4 accelerate 0.29.2 torch 2.1.2 ``` …

anilozlu updated 1 week ago
3
microsoft/DeepSpeed #3960

Deepspeed Inference not working on llama when input has padd…

**Describe the bug** I am tryting to do batch inference, so the inputs needs padding. When using `replace_with_kernel_inject=True`, the engine output is incorrect. setting `replace_with_kernel_inject…

KimmiShi updated 2 months ago
4
tpoisonooo/llama.onnx #25

GPU Inference

`llama.onnx` is primarily used for understanding LLM and converting it to NPU. If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy. …

tpoisonooo updated 1 year ago
3
hpcaitech/ColossalAI #3103

[FEATURE]: tensor parallel microbenchmark changes to support…

### Describe the feature **Problem** The intrahost [microbenchmarking CLI tool](https://colossalai.org/docs/basics/command_line_tool/#tensor-parallel-micro-benchmarking) executes the "None" (DDP) st…

MEllis-github updated 1 year ago
1
vllm-project/vllm #7124

[RFC]: Model architecture plugins

### Motivation. As a continuation to #5367 - as this merge request was rejected and I have to maintain my own fork to support this scenario, I suggest we should add support in vLLM for model architec…

NadavShmayo updated 2 months ago
14
microsoft/DeepSpeedExamples #760

Why not just use zero3 inference to generate sequence in Dee…

DeepSpeed Chat use tensor parallelism via hybrid engine to generate sequence in stage3 training. I wonder if just use zero3 inference for generation is ok? So that we don't need to transform model pa…

LSC527 updated 1 year ago
3
flexflow/FlexFlow #1444

Issue with FlexFlow LLM Compilation and Generation

Hello, I am encountering an issue while testing FlexFlow's LLM module. Below is the code I am using: `import flexflow.serve as ff import time ff.init( num_gpus=1, memory_per_gpu=2200…

QAZWSX0827 updated 3 months ago
1

上一页 1...13 14 15 16 17 18 19...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism