tgi Search Results - Githubissues

1000+ results
for tgi

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/text-generation-inference #1826

Process hangs in local run

``` (text-generation-inference) root@C.10294313:~/tgi_test/text-generation-inference$ text-generation-launcher 2024-04-29T11:11:11.331114Z INFO text_generation_launcher: Args { model_id: "bigscie…

Hojun-Son updated 1 month ago
2
InternLM/lmdeploy #1737

[Docs] Where is prefix cache data stored?

I'm guessing prefix cache is stored in the GPU VRAM. I'm wondering whether it's possible to allocate a percentage of system RAM to store prefix cache? Or would that generally be too slow? I.e. faster …

josephrocca updated 1 month ago
6
griptape-ai/griptape #704

vLLM endpoint support

- [x] I have read and agree to the [contributing guidelines](https://github.com/griptape-ai/griptape#contributing). Hello, I'm trying to connect a locally hosted LLM to to a prompt engine, we are …

s-m-palmier updated 3 months ago
4
vllm-project/vllm #1300

Benchmark_serving.py fails for HuggingFace TGI

I am seeing the below error about max_tokens when I run benchmark_serving.py for HuggingFace TGI. Is there anything else I should be doing? I started the server with: `./launch_tgi_server.sh facebo…

dyastremsky updated 3 months ago
7
huggingface/text-generation-inference #1644

Shard 2 failed to start

### System Info i was trying to run CohereForAI/c4ai-command-r-v01 with these commands model= CohereForAI/c4ai-command-r-v01 volume=$PWD/data # share a volume with the Docker container to avoid …

yasserkh2 updated 1 month ago
7
huggingface/text-generation-inference #1649

Unable to use '--rope-scaling yarn' option

### Bug Report PR: Adding yarn support https://github.com/huggingface/text-generation-inference/pull/1099 We can find 'yarn' for rope_scaling type. `elif rope_scaling["type"] == "yarn":` But, t…

calycekr updated 2 months ago
1
huggingface/text-generation-inference #601

GPTQ Formats that work (and don't)

Now that we can load GPTQ files that haven't been quantized by TGI's quantization script, I thought I'd do a set of tests to see which formats work and which don't. I'm using https://huggingface.co/Th…

ssmi153 updated 2 months ago
30
huggingface/text-generation-inference #1656

'details' in /v1/chat/completions endpoint missing

### System Info 'details' in /v1/chat/completions endpoint missing This works: ``` stream_url ="localhost:8000/generate_stream" payload = { "inputs": prompt, "parameters": { …

daz-williams updated 2 months ago
2
modal-labs/modal-client #1816

Special tokens have been added in the vocabulary, make sure …

I've been trying to deploy the new LLaVA-NeXT with Sglang on Modal but not sure why I'm getting "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tun…

Iven2132 updated 1 month ago
1
bentoml/OpenLLM #420

bug: can't load GPTQ quantized model

### Describe the bug I try to run one of TheBloke's quantized models on an A100 40GB. It is not one of the most recent models ### To reproduce ``` openllm start llama --model-id TheBloke…

BEpresent updated 4 weeks ago
2

上一页 1...89 90 91 92 93 94 95...100 下一页

1000+ results for tgi

1000+ results
for tgi