tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

predibase/lorax #6

Fuse allgather requests across adapters and q, k, v to reduc…

### Feature request The current approach to tensor parallelism from #5 is not latency optimized. We make an allgather call for every adapter, which will be quite slow for many adapters. Additionally,…

tgaddair updated 9 months ago
2
apple/coremltools #2325

Failed to build the model execution plan using a model archi…

## 🐞Describing the bug Hello. I'm trying to convert PyTorch model to Stateful CoreML Model I wrote this code referred to [WWDC 2024 session Mistral-7B model](https://github.com/huggingface/swift-t…

Skyline-23 updated 1 week ago
2
microsoft/DeepSpeedExamples #596

Do we have any plans on supporting pipeline parallel?

Hi, because recently I'd like to fine-tune bloom-7b1 by ds-chat using full model parameters, while I find it does not have any supports on pipeline parallel. Do we have any plans on supporting pipeli…

LSX-Sneakerprogrammer updated 1 year ago
2
pytorch/pytorch #123962

FSDP Doesn't Work with model.generate()

### 🐛 Describe the bug I am trying to use FSDP, but for some reason there is an error when I do model.generate(). MWE below ``` import torch import os from omegaconf import DictConfig from tra…

QiyaoWei updated 11 hours ago
6
microsoft/DeepSpeed #1575

[REQUEST] Activation Checkpoint Prefetch

**Is your feature request related to a problem? Please describe.** Activation prefetch features to enlarge batch size on middle-size(100B~1T) of models - From DeepSpeedExamples repo, GPU throughput…

ckddls1321 updated 2 years ago
2
microsoft/DeepSpeedExamples #760

Why not just use zero3 inference to generate sequence in Dee…

DeepSpeed Chat use tensor parallelism via hybrid engine to generate sequence in stage3 training. I wonder if just use zero3 inference for generation is ok? So that we don't need to transform model pa…

LSC527 updated 11 months ago
3
microsoft/MInference #62

[Question]: It seems that minference does not currently supp…

### Describe the issue

zh2333 updated 1 month ago
2
deepmodeling/deepmd-kit #3766

Issue with dp --pt test and validation dataset size

### Bug summary Encountered an issue when using the "descriptor": "dpa2" to train a model from scratch for 500k steps and then testing the model on a merged validation dataset. The merged validatio…

PhelanShao updated 1 month ago
10
tpoisonooo/llama.onnx #25

GPU Inference

`llama.onnx` is primarily used for understanding LLM and converting it to NPU. If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy. …

tpoisonooo updated 1 year ago
3
aws-neuron/aws-neuron-sdk #891

Running Llama3 Returns Tensor Allocate Status 2

When running the notebook for inference using [Llama3](https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb) ```…

pedrohernandezgeladocma updated 3 months ago
3

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism