-
/kind feature
**Describe the solution you'd like**
Hope add [https://github.com/xorbitsai/inference](https://github.com/xorbitsai/inference) as the kserve huggingface LLMs serving runtime
Xor…
-
Hey Guys,
This is a great library, but I have a question. Is this library is able to use memory as efficiently as the Llama.cpp library? In otherwords, if I'm using a checkpoint that I use with Llama…
-
Traceback (most recent call last):
File "E:\Blender_ComfyUI\ComfyUI\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\Blen…
-
### Describe the issue
Just saw this error on our logs, I've to investigate how to reproduce:
[E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running…
-
This library is amazing and I'm currently using it in [TabbyAPI](https://github.com/theroyallab/tabbyAPI/tree/json-grammar) to constrain JSON generation. However, it would also be great if this librar…
-
**LocalAI version:**
OK:
- `local-ai-avx2-Linux-x86_64-1.40.0`
- `local-ai-avx2-Linux-x86_64-2.0.0`
- `local-ai-avx2-Linux-x86_64-2.8.0`
- `local-ai-avx2-Linux-x86_64-2.8.2`
- `local…
-
# Summary
Currently we have two "eval" scripts for measuring performance of LLMs post quantization: https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py,
https://github.com/pytorch/…
-
Does vllm supports contrastive search? If not, would be great to add that support as soon as possible? [Research](https://arxiv.org/pdf/2210.14140.pdf) shows that this improves model quality significa…
-
Hi,
Congrats on this work! I discovered it from the paper page: https://huggingface.co/papers/2408.15881 (feel free to claim the paper in case you're one of the authors, so that it appears at your …
-
Hi, I am getting following error during inference after the training is completed.
File "v2_main.py", line 156, in
generated_ids = model.generate(**inputs, max_new_tokens=40)
File "/home/u…