-
# Overview
Does DeepSpeed leverage the 3D parallelism (i.e. data parallelism + pipeline parallelism + tensor parallelism) for huggingface models (e.g., GPT-J, LLaMA) fine-tuning?
May I ask anybody k…
-
https://arxiv.org/abs/2402.19481
https://github.com/mit-han-lab/distrifuser
-
Hi,
I am trying to finetune a llama2 model with sequence parallelism using Megatron-DS. Is there any documentation for this ?
-
I have inf2.24xlarge and I am running the Llama-2 inference example. All the packages are installed latest.
Everything worked fine until the step where I load model with tp_degree = 24 and it faile…
-
### Bug summary
Encountered an issue when using the "descriptor": "dpa2" to train a model from scratch for 500k steps and then testing the model on a merged validation dataset. The merged validatio…
-
### 🐛 Describe the bug
Hi, when I use a custom backend, I find that the fx graph that custom compiler gets does not have the stream related operations.
Then I found that the fx graph dropped those…
-
Hello.
I am currently employing the vllm library alongside dataparallel for my projects.
Up until version 0.2.6, it was feasible to designate specific GPUs for each worker explicitly, which was i…
-
### 描述该功能
我在评测时的模型type 为vllm,参数如下:
![image](https://github.com/open-compass/opencompass/assets/97608046/851bccbb-1f7f-420c-b7ca-fc00677a12cf)
但是显卡占用只使用了一张卡来评测任务
![image](https://github.com/open-…
-
### Your current environment
-
### How would you like to use vllm
I'd like to use multiple vllm instances in the same python script, each on a different CUDA device. Is it possible to pin an `LLM` …
-
**Context**
To compose per-parameter-sharding FSDP with `DTensor`-based tensor parallelism, we need to reshard an existing `DTensor` to its parent mesh and include the FSDP dim-0 sharding.
The cur…
awgu updated
7 months ago