-
Hi there, you made fantastic framework for llms. But what I find very confusing is how to run this on cuda and direct ml. I simply don't know how to do it in C#..
I there any example? Second questio…
-
### The quantization format
Hi all,
We have recently designed and open-sourced a new method for Vector Quantization called Vector Post-Training Quantization (VPTQ). Our work is available at [VPTQ…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
### Feature Description
When using LLM serving frameworks such as [vLLM](https://github.com/vllm-project/vllm) or [MLC-LLM](https://github.com/mlc-ai/mlc-llm) , or services that host open-source mod…
-
Hi, i want to do my own chat AI interface for study and personnal project. I find the library interesting, the only problem is that I use Ollama as LLM provider (local server for llms). So Ithink that…
-
Quark is a comprehensive cross-platform toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, Quark empowers developers to optimiz…
-
### Describe the bug
APICallError [AI_APICallError]: prompt is too long: 202609 tokens > 200000 maximum
at file:///C:/Bolt/bolt.new-any-llm/node_modules/.pnpm/@ai-sdk+provider-utils@1.0.9_zod@3.…
-
ToolCall is not generating from the response of llama 3.1 model from LM Studio, when using langchain framework connecting through ChatOpenAI ,
Same Tool call is working fine with ollama for the same …
-
**Is your feature request related to a problem? Please describe.**
This feature proposal introduces a toolkit for SQL and data analysis, by enhancing current SQL tool and introducing few other conc…
-
### Model Series
Qwen2.5
### What are the models used?
Qwen2.5-0.5B-Instruct
### What is the scenario where the problem happened?
inference with transformers, deployment with vllm/PeftModelForCau…