IBM / text-generation-inference

IBM development fork of https://github.com/huggingface/text-generation-inference
Apache License 2.0
52 stars 30 forks source link

Set TP argument correctly when instantiating PagedKVCacheManager #94

Closed tdoublep closed 4 months ago

tdoublep commented 4 months ago

Motivation

Users are seeing runtime errors when trying to use TP>1 with speculative decoding.

Modifications

We need to set the tensor parallel argument correctly when we instantiate the PagedKVCacheManager.

Result

I have verified that this change resolves the reported issue.

Related Issues

https://huggingface.co/ibm-fms/llama3-8b-accelerator/discussions/1