gpu-parallelism Search Results

1000+ results
for gpu-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

swiss-ai/nanotron #19

Get 70b in our fork working with pp4, tp4, dp>1

Using our launcher and the latest pull of our pretrain repo you can run a Llama3 70B model as follows. Thanks to @AleHD for getting activation recompute and async working. ``` (export DP=1 PP=4 BA…

ischlag updated 2 weeks ago
6
pytorch/torchtitan #596

Gradient norm clipping with pipeline parallelism (PP)

Dear torchtitan team, I have a question regarding gradient norm clipping when using pipeline parallelism (PP) potentially combined with `FSDP/DP/TP`. For simplicity, let's assume each process/GPU h…

zijian-hu updated 6 hours ago
4
emer/axon #168

Data parallelism for GPU (and CPU)

We should get major efficiencies and speedup by running multiple "data" pathways through the same synaptic weights and network architecture. In effect, it is like "shared weights" for multiple copies…

rcoreilly updated 1 year ago
1
huggingface/transformers #32864

Multiprocessing support

Running model forwards within a process seems to get stuck. I tried to set `TOKENIZERS_PARALLELISM` to `true` and `false` but unfortunately both couldn't help 🥲 ### System Info `transformers-cli…

keyboardAnt updated 3 weeks ago
5
QwenLM/Qwen2-VL #261

Qwen2-VL-72B-Instruct在docker镜像qwenllm/qwenvl不支持pipeline para…

手里没有4X80G的卡, 在4X40G的卡环境中Qwen2-VL-72B-Instruct显存不够, 通过多node用模型流水来部署, 但是vLLM中不支持, 这个咱们后续后可能支持吗 ```bash python3 -m vllm.entrypoints.openai.api_server --port 8000 --model /llm_weights/Qwen2-VL-72B-Ins…

db24 updated 1 week ago
1
NVIDIA/cutlass #1789

[QST] Understanding double buffering in GEMM kernels

**What is your question?** Hello! I’ve been exploring the Cutlass examples for GEMM and Convolution and noticed the use of double buffering. https://developer.nvidia.com/blog/cutlass-linear-algebra-…

phantaurus updated 3 weeks ago
1
triton-inference-server/server #7660

Direct Streaming of Model Weights from Cloud Storage to GPU …

**Is your feature request related to a problem? Please describe.** I’m facing an issue when deploying large models in Kubernetes, especially when the pod’s ephemeral storage is limited. Triton Infere…

azsh1725 updated 11 hours ago
3
vllm-project/vllm #5003

[Feature]: Tensor Parallelism with non divisble amount of at…

### 🚀 The feature, motivation and pitch I am trying to run a 70B model on a node with 3XA100-80Gi. 2XA100-80Gi does not contain enough VRAM to run the model, and when I try to run vLLM with tensor p…

NadavShmayo updated 3 weeks ago
7
vllm-project/vllm #8453

[RFC]: Support encode only models by Workflow Defined Engine

### Motivation. As vllm supports more and more models and functions, they require different attention, scheduler, executor, and input output processor. . These modules are becoming increasingly com…

noooop updated 1 week ago
3
tensorflow/tensorflow #62736

Need MPMD supporting for GPU to use pipeline parallelism tra…

### Issue type Feature Request ### Have you reproduced the bug with TensorFlow Nightly? Yes ### Source binary ### TensorFlow version tf 2.15 ### Custom code No ### OS pla…

MoFHeka updated 1 month ago
23

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for gpu-parallelism

1000+ results
for gpu-parallelism