-
Is it possible to use a TPU for inference?
The guys at [NLPCloud.io](https://nlpcloud.io) told me that's what they're doing, but I have no idea how they're doing it...
First I don't know how to su…
-
it would be great to have int8 support for GPT-J, both (INT8 for weights only) but ideally w8a8 too
-
### Description
```shell
branch: main
fastertransformer docker: 22.12
```
### Reproduced Steps
```shell
docker run -it --rm --gpus=all --shm-size=1g --ulimit memlock=-1 -v ${WORKSPACE}:…
-
### Description
Expected behavior:
```shell
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
>>> tokenizer.encode('')
[50256]
``…
-
It looks like EleutherAI/gpt-j-6b is not supported:
Env:
Running from docker:
```
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
RUN apt-get update && apt-get install git -y
RUN pip …
-
Hi, I have a question about the tokenizer mismatch.
When the reference model is fixed to be "gpt-j-6B", several scoring models do not share the same tokenizer, such as "gpt-neox-20b" and "llama". …
-
The current implementation of GPT-J and BERT carries out the prediction in sequential manner. Could the performance of GPT-J and BERT be improved by implementing parallel processing through threads ra…
-
I would like to run the `ggml/gpt-j` version on the MLPerf benchmark. Is it possible to use a fine-tuned GPT-J checkpoint listed here: https://github.com/mlcommons/inference/blob/master/language/gpt-j…
-
Has this library been tested with larger models such as GPT-J-6B and GPT-NeoX-20B? Are there plans to support larger models like these? Thanks.
-
is it possible to have swap space support? ( I heard about ' Handling big models for inference' and was wondering if ggml can support a similar feature or store part of the large model in swap.)