-
When I tested Qwen2-7B on this library, it reported some errors.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from intel_npu_acceleration_library import NPUModelForCausalL…
-
Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2…
-
As the title, and beyond the title, is there any way to implementation rwkv llava, minicpm-v, internml-composer-v or qwen-v ?
-
```py
GRADE_MODEL_NAME = ApolloClient.get_value(key="GRADE_MODEL_NAME", default_val="",
namespace=SERVICE_NAMESPACE)
XINFERENCE_URL = ApolloCli…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a…
-
“我们将MiniCPM的模型权重转化成了Llama代码可以直接调用的[格式](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16-llama-format),以便大家尝试”
请问这个是怎么转换的?能否提供脚本?
-
请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么?2个及以上并发调用就报错了,单个没有问题。。。。
![微信截图_20240619180246](https://github.com/xorbitsai/inference/assets/167763677/caba7bf3-199d-4a24-88a3-b0e9833b50b2)
![微信截图_2024061918…
-
### Describe the bug
使用最新版本 xinference 部署 bge-reranker-v2-minicpm-layerwise,modescope 无法下载,更换 huggingface 后部署成功,但在使用的时候耗时特别严重,基本无法应用。
```
You're using a LlamaTokenizerFast tokenizer. Please note …
-
### Your current environment
```text
安装完之后,直接运行python examples/minicpmv_example.py出现的问题
INFO 06-27 10:16:32 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/local/lib/pytho…
-
运行日志如下:
~/llm/demo/server_demo$ sudo python flask_server.py --target_platform rk3588 --rkllm_model_path ../../model/minicpm.rkllm
=========init....===========
rkllm-runtime version: 1.0.1, rknpu d…