tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

RulinShao/LightSeq #3

About Megatron-LM sequence length

Thx for ur brilliant work! I have a small question regarding the content of the paper: In Table 1, for the sequence length, will the total length of the Megatron-LM method be equal to the sum of the …

ZhongYingMatrix updated 7 months ago
1
sgl-project/sglang #76

Support for offline batch mode with local models?

Hello, guys, Any plans to support offline batch inference mode with local models, without spinning up an additional server? similar to [what is implemented in vLLM](https://docs.vllm.ai/en/latest/get…

niklub updated 5 months ago
4
NVIDIA/Megatron-LM #585

How Megatron trains multiple subgraphs on multiple GPUs and …

**Your question** ```[tasklist] ### Tasks ``` ```[tasklist] ### Tasks ```

jrt-20 updated 7 months ago
4
pentium3/sys_reading #228

Alpa: Automating Inter- and Intra-Operator Parallelism for D…

https://arxiv.org/pdf/2201.12023.pdf

pentium3 updated 4 months ago
6
huggingface/accelerate #1366

TypeError: unsupported operand type(s) for /: 'NoneType' and…

### System Info ```Shell - `Accelerate` version: 0.18.0 - Platform: Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17 - Python version: 3.9.12 - Numpy version: 1.22.4 - PyTorch version (…

sam-hieken updated 3 months ago
8
NVIDIA/TensorRT-LLM #108

how two run with 2 gpus

I try to Build LLaMA 7B using 2-way tensor parallelism. But when I execute run.py I got this error.AssertionError: Engine world size (2) != Runtime world size (1)

UncleFB updated 8 months ago
2
NVIDIA/TensorRT-LLM #453

Why are the human eval scores of smoothquant and int8_weight…

We use A10, and model of CodeLlama-7B which from [HuggingFace](https://huggingface.co/codellama/CodeLlama-7b-hf/tree/main). And I use the latest tensorrtllm_backend and TensorRT-LLM of main branch.…

activezhao updated 6 months ago
21
mit-han-lab/llm-awq #144

CUDA out of memory when trying to run AWQ search on A100-80G…

Thanks for the latest updates and improvements! I was looking into the different llava example notebooks and the [VILA example](https://github.com/mit-han-lab/llm-awq/blob/main/scripts/vila_example.s…

isaac-vidas updated 4 months ago
10
NVIDIA/TensorRT-LLM #643

AttributeError: 'NoneType' object has no attribute 'trt_tens…

I used the following steps to build SQ engine First, build docker image from main branch ``` git clone -b main https://github.com/triton-inference-server/tensorrtllm_backend.git # Update the su…

wjueyao updated 6 months ago
2
PaddlePaddle/Serving #1173

uci_housing_client测试报错

python3.6 test_client.py uci_housing_client/serving_client_conf.prototxt WARNING: Logging before InitGoogleLogging() is written to STDERR I0425 03:51:49.812734 3687 general_model.cpp:73] feed var n…

bluestinger updated 4 months ago
6

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism