-
Hi - really interesting work. We're currently using HF TGI in production and exploring using this instead, are there plans to add things like typical_p that transformers supports? Would greatly ease t…
-
I have encountered the problem mentioned in the title. Could someone help me understand what is going on and how to resolve it?
Any assistance would be greatly appreciated.
-
At time the directory `tgis` is not yet copied:
https://github.com/mundialis/actinia_core/blob/e1510fce08f8d40d5fc048dc7fd81e920ed5ffc3/src/actinia_core/resources/persistent_processing.py#L477
I…
-
**Is your feature request related to a problem? Please describe.**
Currently we are hosting Open Source Models like Mixtral-8x7B with the Hugging Face Inference Endpoint. With the new tgi 1.4 Version…
-
Hi there :-)
Is there a possibility to configure multiple users / concurrent request sessions?
I'd like to simulate how the different backends behave if not 1 user, but e.g. 8 users concurrently a…
-
models can still be yanked, but this should reduce the variability
Here's an example, which uses tgi:
```py
def download_model():
subprocess.run(
[
"text-generation-s…
-
我是用TGI加载本地模型CodeShell-7B-Chat,但是加载过程中报错,我使用的命令如下:
```sh
sudo docker run --gpus 'all' --shm-size 1g -p 9090:80 -v /home/CodeShell/WisdomShell:/data --env LOG_LEVEL="info,text_generation_router=debug"…
-
Building upon the current plotting / textmode enhancements and its isolated yet modular-packaging, consider a "[minimalist](https://github.com/picocomputer/ehbasic-plus/issues/1#issuecomment-189084126…
-
Using TGI or Lorax eetq quantization takes several minutes (Eg 10 minutes for Mixtral) every time the launcher is run .
As a reference bitsandbytes nf4 quant takes 1 minute.
Is there any way to …
-
Hello,
We are using latest main TensorRT LLM and container build with TensorRT-Backend to run Mixtral. Generation doesn't stop and goes until max_tokens is reached. Passing "end_id": 2 doesn't help.
…