tgi Search Results - Githubissues

1000+ results
for tgi

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

stanfordnlp/dspy #230

Drop prompt from output is determined in an unclear manner i…

Dropping the prompt from the model output is necessary to be able to correctly retrieve an output from the Prediction object, but this is only done in HFModel when a ValueException is thrown upon mode…

bendavidsteel updated 11 months ago
3
scaleapi/llm-engine #277

Control frequency - completion

Hi, Is there a way to change the frequency_penality or logit bias when sending a completion request?

Stealthwriter updated 1 year ago
3
AstraZeneca-NGS/VarDictJava #385

Having trouble downloading Vardict 1.8.3 through wget

Hi, I was trying to use command below to download Vardict-1.8.3 but get html files instead of zip files: wget https://github.com/AstraZeneca-NGS/VarDictJava/releases/tag/v1.8.3/VarDict-1.8.3.tar …

ZichanLi updated 1 year ago
2
vllm-project/vllm #2348

OOM with meta-llama/Llama-2-70b-chat-hf

I have 8 Tesla V100 32 GB GPUS, and set tensor_parallel_size tp 8, which should be enough to run meta-llama/Llama-2-70b-chat-hf but I am getting an ``` RuntimeError: CUDA error: out of memory CUDA …

nutmilk10 updated 1 week ago
4
huggingface/llm-vscode #105

Empty response with custom api

I built and install a custom API at https://7b80-103-253-89-37.ngrok-free.app/api/generate Everything works fine. But when change the endpoint and config template in LLM VSC into this endpoi…

thanhnew2001 updated 9 months ago
3
aws-neuron/aws-neuron-sdk #825

Neuron compilation failed for LIama-2 70B with Optimum-neuro…

I started a `inf2.48xlarge` ec2, pull and get into [TGI-Neuron DLC with optimum-neuron 0.0.17 installed](https://github.com/aws/deep-learning-containers/releases/tag/v1.0-hf-tgi-0.0.17-pt-1.13.1-inf-n…

Neo9061 updated 8 months ago
3
philschmid/llm-sagemaker-sample #9

VRAM Requirements

Hi, thanks for publishing this example. With Mixtral + TGI, is it actually required to fit the full model in VRAM? Or, is it possible to opt for 100GB+ of system memory with lower GPU capacity? …

collinhundley updated 10 months ago
1
triton-inference-server/server #6358

Is there any plan to open source Inflight Batching for LLM S…

We are using Triton Inference Server for model inference and currently facing throughput bottlenecks with LLM inference. I saw in a public video that Nvidia has optimized LLM serving by supporting `In…

liuyang-my updated 6 months ago
2
gofireflyio/aiac #125

Truncated output with local backend?

➜ aiac --version aiac version 5.2.1 We are using local backends provided by huggingface TGI ```yaml [backends.phi3] type = "openai" default_model = "Phi-3" url = "https://phi3.ourcluster/…

remmen-io updated 4 months ago
2
WisdomShell/codeshell-vscode #45

Docker启动量化模型的GPU模式

Docker启动GPU推理模式，可以使用INT4量化后的模型吗，启动好像出错了。

jasonheyh updated 12 months ago
2

上一页 1...22 23 24 25 26 27 28...100 下一页

1000+ results for tgi

1000+ results
for tgi