-
I noticed there are some settings about tensor parallelism in `DeepSpeedEngine` and `PipielineEngine`. Can you please provide us with some examples of combinig tensor parallelism with pipeline paralle…
-
i build my model with --tp_size 2 --world_size 2, and put two generated model files into the backend directory and use the default config.pbtxt.
then i run the script/launch_triton_server.py --model_…
-
It seems that there is a bug when using the `use_mem_eff_path` feature, when `ngroups` is greater than 1. The loss curve initially decreases but then stabilizes around a constant value and fails to co…
-
# 🚀 Feature request
Splitting the discussion that started here: https://github.com/huggingface/transformers/pull/10301#issuecomment-782917393 to add the potential future feature of transformers and…
-
### The settings are as followed:
devices = 0&1&2&3;4&5&6&7
decoder_cpu_layer_count = 0
cpu_threads = 8
max_concurrent_queries = 6
return_output_tensors = true
;debug options
is_study_mod…
-
I've been using `atq.INT4_AWQ_CFG` and observing a performance drop when quantizing a Llama 70B model with tensor parallelism with`atq.quantize(model, quant_cfg, forward_loop=calibrate_loop)`.
Quan…
-
### 🐛 Describe the bug
I was trying to distribute some model using tensor parallelization, but I ran into a grad output type mismatch when I enabled compile. Note, I was not using loss parallel her…
-
I am following the code as mentioned in the AWS documentation to host GPT-J-6B using DJL serving
[ https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/pytorch_deploy_l…
-
Would it be possible in this framework that the pipeline is incorporated to tensor parallelism or zero data parallelism?
-
I am unable to get the llama example to work with tensor parallelism.
I have 2x L4 machines
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
When running the script
htt…