-
Hi experts,
I'm using the latest trtllm code for my llama2-like model inference(smoothquant) which tunned supporting long context(32k) from llama2, but I observed that the performance may not be quit…
-
### System Info
Hi,
I'm having trouble reproducing NVidia claimed numbers in the table here: https://nvidia.github.io/TensorRT-LLM/performance/perf-overview.html#throughput-measurements
System Im…
-
### System Info
- tensorrtllm_backend built using Dockerfile.trt_llm_backend
- main branch tesnorrt llm (0.13.0.dev20240813000)
- 8xH100 SXM
- Driver Version: 535.129.03
- CUDA Version: 12.5
…
-
- [ ] To run ollama pull llama2 in the docker-compose command
-
I am running genai-stack on my **Mac** and getting this error when I do: docker-compose up --build
pull-model-1 | pulling ollama model llama2 using http://host.docker.internal:11434
pull-model-…
-
running:
`cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1 --model=llama2-70b-99 --implementation=reference --framework=pytorch --category=datacenter --scenari…
-
Hi,
I have tried adding phi3-3.8b, as an ollama model, hosted on my own prem ollama server.
I have basically copied the prompt template and parameters from microsoft/Phi-3-mini-4k-instruct used in h…
-
I install llama2 and 3 through ollama in windows,danswer is also installed in windows,
![image](https://github.com/danswer-ai/danswer/assets/106233935/6b2a3594-dd52-40e9-8dd7-74530d384ffe)
![image](…
-
######
Below error in datacenter category, llama2-70b-99.9 model, Offline Scenario in docker container and used this command script.
cm run script --tags=run-mlperf,inference,_find-performance,_fu…
-
Hey. Not a computer scientist here, but thought you guys'd like to know that the latest pushed container image is causing issues with gpu inference for me.
System specs
CPU: AMD Ryzen 3600
GPU: I…