-
Hi, I was wondering if this method can be used for trimming large vocabulary in LLMs. Can vocab trimmer be extended to LLMs?
-
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models
-
A game jam / hackathon around using LLMs in interesting ways, not to replace reading/writing, but to help us do it better. See ideas in https://github.com/DefenderOfBasic/works-in-progress/issues/7
…
-
is possible to add support to other LLMs like Ollama API ?
-
### Feature Description
Need to implement the stream_chat function.
class Vllm(LLM):
@llm_chat_callback()
def stream_chat(
self, messages: Sequence[ChatMessage], **kwargs: Any…
-
gevou updated
8 months ago
-
Hi there!
![image](https://github.com/huggingface/nanotron/assets/49240599/38bc4c4d-f0ec-40f1-bd57-2679c7fe03f4)
Microsoft have just released the full handbook for reproduing the 1-bit LLM pape…
-
### Bug Description
Query Engine gives `incomplete streaming response` when using `Gemini LLMs`. Whenever streaming is enabled, `the first chunk of the output text is missing`, but if streaming is di…
-
My environment:
sglang 0.1.17
torch 2.3.0
CUDA 11.8
My Problem:
My sglang works well on qwen1.5-4B and qwen1.5-0…
-
My initial testing comparing ct2 (using int8) and the ```bitsandbytes``` library at 4 and 8 bit...nicely done ctranslate2 people. Looking forward to testing GGUF in there as well.
![image](https:/…