-
**Describe the bug**
Basically, when conversations exceed about 5500 or 6000 tokesn in context, the time taken for silly to make a request to the proxy server increases to around 15-20 seconds. that …
-
### System Info
os: ubuntu 20.04.6
kernel: Linux 5.4.0-174-generic
cpu: intel x86_64
GPU: NVIDIA A800
Tensorrt LLM: v0.8.0
tensorrtllm_backend: v0.8.0
docker container: nvcr.io/nvidia/tritons…
-
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 4: invalid continuation byte
OSError: It looks like the config file at './models/ggml-gpt4all-j-v1.3-groovy.bin' is not a valid JS…
-
### System Info
- `transformers` version: 4.31.0.dev0
- Platform: Linux-5.15.107+-x86_64-with-glibc2.31
- Python version: 3.10.12
- Huggingface_hub version: 0.16.2
- Safetensors version: 0.3.1
-…
-
Hi,
I reported too slow data fetching when data is large(#2210) a couple of weeks ago, and @lhoestq referred me to the fix (#2122).
However, the problem seems to persist. Here is the profiled resu…
-
- I run the [code](https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-zero-inference.py), it will use zero.
my code is below
```
# usage:
# de…
-
Hello,
Please help to resolve the following issue.
I built my own recipe based on _egs2/librispeech/asr1_.
I was able to successfully run all the stages, where I used GPU for decoding.
However…
-
### Describe the bug
When using a Llama 2 model like `TheBloke_MythoMax-L2-13B-GPTQ` and use my own setting parameters i get this weird warning on the console saying:
```D:\OOBABOOGA\installer_f…
-
### Bug Description
Hi,
I'm following this official [example](https://gpt-index.readthedocs.io/en/latest/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.html) from the docs but I'…
-
when I follow pip install -e ".[gpu]",I find the error about mosaicml-streaming
#-------------------------------------------------------------------------------------------
root@7730f5bd29fa:/hom…