-
Currently, the script of `tools/merge_mp_partitions.py` only provides merging tensor model parallelism and splitting into given pipeline model parallelism which is quite constrained for use. I suggest…
-
I am converting a Mixtral8x7B with tensor parallelism using conversion script from llama folder :
python convert_checkpoint.py --model_dir ./Mixtral-8x7B-v0.1 \
--out…
-
Hello, I have compared the training speed between tensor parallel and pipeline parallel in Megatron with a DGX A100 node.
I find that when the micro-batch-size and gradient accumulation steps are bi…
-
I trying to figure out why my script doesn't work without "--force_multi" param in ds_launch_str
https://github.com/microsoft/DeepSpeed-MII/blob/0fe4eb86b93e8210736f3e8c671bc886af64fd67/mii/server.py…
-
When running the notebook for inference using [Llama3](https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb)
```…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC …
-
Support tensor parallel/pipeline parallel currently?
-
Hi,
I was able to run _TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ_ model on 2 A10 gpus on AWS Sagemaker. I was using _ml.g5.12xlarge_ instance type.
Command to run the code
`python3 -m vllm.ent…
-
To train a model(7B) with megatron-deepspeed,
tensor_parallelism=2
pipeline_parallelism=8
how many GPUS do i need?
lyzKF updated
11 months ago
-
**Describe the bug**
In Hybrid Engine, the `apply_tensor_parallelism()` is not called when model inference container requires tp > 1 but self.mpu is None. For example, for a large model in Zero3, …