-
While looking at the reason why one of the pipelines was stuck in the `LABEL_INFERENCE`, I found that we appear to re-load the full trip model before each trip inference.
This is:
- unncessary, siโฆ
-
Instead of pressing a key, continuously listen till the wake work is announced (i.e., "Hey Ross")
-
### ๐ Describe the bug
This config works:
```
vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=25.0 -XX:MaxRAMPercentage=100.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError
iโฆ
-
๋ก์ปฌ์ด๋ ํ
์คํธ submit(nsml submit -t ~~)์์๋ ์ค๋ฅ๊ฐ ์๋๋ฐ, ์ค์ ์ ์ถ์์๋ inference๊ฐ ๋๋ ํ์ ์๋์ ๊ฐ์ด ์ค๋ฅ๊ฐ ๋น๋๋ค.
์ด์ ๋ฅผ ์ ์ ์์๊น์? infer์ชฝ์ ์ ๊ทผ์ด ์ ํ๋์ด ์ค๋ฅ ์์น๋ ์์ธ ์ฐพ๊ธฐ๊ฐ ํ๋๋ค์.
Building docker image. It may take a while
.........loโฆ
-
### System Info
V100*2
nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3
tensorrt-llm 0.7.0
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own moโฆ
-
Hi,
I am very interested in the distributed inference of Colossal AI. Since we have pre-trained NLP models from Pytorch or JAX, I wonder if possible or what should be done to use EnergonAI for infereโฆ
-
**Is this a BUG REPORT or FEATURE REQUEST?**:
> Uncomment only one, leave it on its own line:
>
> /kind bug
> /kind feature
**What happened**:
Investigate if we can use https://github.โฆ
-
## Bug description
In README.md, it's stated that the prompts used in production for HuggingChat can be found in PROMPTS.md.
However, PROMPTS.md has not been updated for 7 months and there are sโฆ
-
## Is your feature request related to a problem? Please describe.
Currently, TorchServe's sanity suite, regression suite, and the recent changes related to logging [GPU info in the model description]โฆ
-
Whenever I try to run `script.py` or follow instructions here: https://blog.roboflow.com/how-to-deploy-cogvlm/
I always get this result: `{'message': 'Internal error.'}`
Using Gradio also returnโฆ