-
# 平台(如果交叉编译请再附上交叉编译目标平台):
# Platform(Include target platform as well if cross-compiling):
ubuntu 20.04 cuda
使用最新的3.0 MNN版本导出qwen2.5-0.5b模型,4bit量化正常,8bit量化输出乱码【无论是否修改"precision": "fp16"】。
#…
-
This issue is to document possible code snippets that could be useful.
1. From a first principles perspective
2. From a framework perspective
(1) is to show the mechanics of what is going on un…
-
Large Language Models (LLMs) are all the rage in the programming world. Some models like Claude 3 Sonnet are reaching capabilities making them very useful for scaffolding code, especially Elm code, wh…
-
#### Steps to Reproduce
-------------------------------------------
1. Compare the chapter title and "Related Frameworks and Taxonomies" between LLM01-10 entries and Appendix 1: LLM Application Arch…
-
**server:** inf2.8xlarge
**vllm version**: 0.6.3.post2.dev77+g2394962d.neuron215
_Desctiption_
Hellow! I am trying to run the code below (the code was taken [here](https://docs.vllm.ai/en/v0.4.1/…
-
# URL
- http://arxiv.org/abs/2411.04282
# Authors
- Haolin Chen
- Yihao Feng
- Zuxin Liu
- Weiran Yao
- Akshara Prabhakar
- Shelby Heinecke
- Ricky Ho
- Phil Mui
- Silvio Savarese
…
-
**Is your feature request related to a problem? Please describe.**
LLMs usually do well in PII detection and de-identification. Using LLMs to identify PII in text could allow users to easily expand P…
-
Hi team,
I would like to use the LogitsPostProcessor in the [C++ Executor API](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/executor/executor.h) to control the generatio…
-
**Is your feature request related to a problem? Please describe.**
There is an overhead of creating new threads when using streaming response feature.
This drogon example demonstrates it very well: …
-
Unsloth is not supported with cuda 12.4. Is there are any alternate methods to use unsloth with cuda 12.4. Also are there any other frameworks supported with cuda 12.4 for continual pretraining of llm…