-
### Branch/Tag/Commit
main
### Docker Image Version
nvcr.io/nvidia-pytorch:22.07-py3
### GPU name
A100
### CUDA Driver
450.156.00
### Reproduced Steps
```shell
1. download …
-
https://github.com/triton-inference-server/
- [x] Build Triton Docker image with support for FasterTransformer backend for Fusion etc.
- [x] convert h2oGPT models to format that Triton understands h…
-
I found that in the benchmark/suite has the output time to first token. However, when I run `python benchmark.py --model meta-llama/Llama-2-7b-hf static --isl 128 --osl 128 --batch 1` an error occurs:…
-
@byshiue
### Branch/Tag/Commit
main
### Docker Image Version
nvcr.io/nvidia/pytorch:21.11-py3
### GPU name
TITAN
### model
https://huggingface.co/TabbyML/NeoX-1.3B
### Repr…
-
Hello,
It seems that currently int8 weight only and SmoothQuant quantizations are supported for GPT models, but no kind of quantization is supported for other autoregressive transformer models, suc…
-
### Branch/Tag/Commit
main
### Docker Image Version
not-specific-to-docker-image
### GPU name
all GPUs
### CUDA Driver
n/a
### Reproduced Steps
```shell
Merely running the example at https://…
-
Hello,
I'm running the following code snippet in `opt.py`.
```
import mii
mii_configs = {"tensor_parallel": 8, "dtype": "fp16", "load_with_sys_mem": True}
mii.deploy(task="text-generation", …
-
@bojone
将模型改成chatglm2后,虽无报错但是效果极差,还望能够协助解决!
以下是生成的运行输出:
Loading checkpoint shards: 100%|██████████████████| 7/7 [00:08
cxj01 updated
5 months ago
-
import os
import pickle
from typing import List
from dataclasses import field, dataclass
from utils import set_default_to_empty_string
FOLDER_ROOT = (
os.path.abspath(os.path.dirname(os.pa…
-
bash training/finetune_RedPajama-INCITE-Chat-3B-v1.sh
My configurations changes as below:
--lr 1e-5 --seq-length 2048 --batch-size 8 --micro-batch-size 1 --gradient-accumulate-step 1 \
--num-layers…