-
Hello, when I run run.py,
```
mpirun -n 2 --allow-run-as-root \
python3 run.py --max_output_len=1024 \
--tokenizer_dir /root/autodl-tmp/llama-2-7b \
--engine_dir=/…
-
### 请提出你的问题
在搭建端到端语义检索系统时,在执行到
3.4.2 文档数据写入 ANN 索引库
时,报如下错误:
[2023-08-15 14:37:11,246] [ INFO] - We are using (, False) to load 'rocketqa-zh-nano-query-encoder'.
[2023-08-15 14:37:11,247] [ …
-
please solve the problem in code
import torch
import uvicorn
import gc
import asyncio
import argparse
import io
from fastapi import FastAPI, WebSocket, Depends
from fastapi.responses …
-
# Overview
I have long wanted proper streaming support in the `encoding/json` library. I’ve been doing some homework to understand the current state of things, and I think I’ve come to grips with m…
-
### System Info
- DGX H100
- TensorrtLlm 0.7.1
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [X] An officially s…
-
### Describe the bug
can't use or download the nih exporter pile data.
```
15 experiment_compute_diveristy_coeff_single_dataset_then_combined_datasets_with_domain_weights()
16 File "/lfs/am…
-
As a contributor to CTranslate2 (and e.g. also VLLM) I would add like to add a minimal example
Background: It's fast, int8 and supports streaming.
https://hamel.dev/notes/llm/03_inference.html
…
-
### Describe the bug
Crash with abort when trying to use AMD graphics card in editor
Model is mistral-7b-instruct-v0.2.Q4_K_M.gguf
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX…
-
Hi, this is a great project. Can you provide some sample data for local development testing? I want to test it out. thank you very much!
-
I am trying to use a 'custom tokenizer' but I am unable to see how can I invoke it. Also can we use a standard tokenizer from HF by pulling it or loading from the local path?