tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed #1575

[REQUEST] Activation Checkpoint Prefetch

**Is your feature request related to a problem? Please describe.** Activation prefetch features to enlarge batch size on middle-size(100B~1T) of models - From DeepSpeedExamples repo, GPU throughput…

ckddls1321 updated 2 years ago
2
coreylowman/dfdx #595

Multi-GPU Support

Scaling models requires they be trained in data-parallel, pipeline parallel, or tensor parallel regimes. The last two, being both "model parallel", require a single model to be shared across GPUs. Thi…

jafioti updated 6 months ago
9
microsoft/DeepSpeed #3960

Deepspeed Inference not working on llama when input has padd…

**Describe the bug** I am tryting to do batch inference, so the inputs needs padding. When using `replace_with_kernel_inject=True`, the engine output is incorrect. setting `replace_with_kernel_inject…

KimmiShi updated 2 months ago
4
vllm-project/vllm #7124

[RFC]: Model architecture plugins

### Motivation. As a continuation to #5367 - as this merge request was rejected and I have to maintain my own fork to support this scenario, I suggest we should add support in vLLM for model architec…

NadavShmayo updated 1 month ago
14
THUDM/GLM #68

Deepspeed zero stage 3

Default deepspeed config for config_block_10B.json is zero-2, when i change it to zero-3, i got a mismatch error. Is there a way to use zero-3 (load param to cpu offload)? In addition, if i only have…

Porraio updated 1 year ago
3
microsoft/DeepSpeedExamples #596

Do we have any plans on supporting pipeline parallel?

Hi, because recently I'd like to fine-tune bloom-7b1 by ds-chat using full model parameters, while I find it does not have any supports on pipeline parallel. Do we have any plans on supporting pipeli…

LSX-Sneakerprogrammer updated 1 year ago
2
microsoft/DeepSpeed #4704

[REQUEST]Support for multiple node inference?

Hi, I want to run one LLM model using multiple machines. On one node, I want to use tensor parallel to speedup. Within multiple nodes, I want to use pipeline parallel. Is this supported? If s…

sleepwalker2017 updated 9 months ago
9
microsoft/DeepSpeedExamples #760

Why not just use zero3 inference to generate sequence in Dee…

DeepSpeed Chat use tensor parallelism via hybrid engine to generate sequence in stage3 training. I wonder if just use zero3 inference for generation is ok? So that we don't need to transform model pa…

LSC527 updated 1 year ago
3
NVIDIA/TransformerEngine #1071

When will comm-gemm-overlap support multi nodes?

I want to use te's comm-gemm-overlap module to perform multi-node training, however the readme says this module only support single node. Does te have any plan for multi nodes support? And what effort…

umiswing updated 2 months ago
6
tpoisonooo/llama.onnx #25

GPU Inference

`llama.onnx` is primarily used for understanding LLM and converting it to NPU. If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy. …

tpoisonooo updated 1 year ago
3

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism