-
Basic idea is to create something low-level, like LLVM Bitcode or WebAssembly, to create the HDL compilers emit the code in this format, which will be fed to the routers/synthesizers after. This will …
-
I want to run the `benchmarks/gptManagerBenchmark`, seems a file is used to generate input_ids.
Is there an example for this?
-
I am testing trtllm backend v0.6.0 for llama2-7b with below setup. The code snippets is as below. If I set one request with length being about 1000, it took about 2-3 seconds to finish. And if I send…
-
If the client tries to close the connection when the server is still generating, the server will crash on segmentation fault. 100% reproducible.
-
I see your Node.js binding using Neon. But have you considered WebAssembly ? There are some tools to compile Rust code easily. So you will get a browser compatiblity and node v13 with a low impact on …
-
How joins are currently handled:
We refer to a Parent Triples Map but in fact we use the Subject Map of the Triples Map.
```
rml:logicalSource ;
rml:subjectMap ;
rml:predica…
-
### Summary
The `media_type` is not inferred correctly when rendering a Jinja template with file extensions other than `.html` such as `.jinja`, `.jinja2`, `.j2`.
So instead of passing a `media_ty…
ADV1K updated
10 months ago
-
**Description**
I used triton inference server with trt-llm backend to deploy Baichuan2, but got errors when sending requests.
**Triton Information**
23.10-trtllm-python-py3
Are you using the …
-
I need batch inference, so i set different max_batch_size, like 1, 64, 128. Then I found the gpu memory use will be 27g, 49g, 72g in inference phrase. so i need at least 72g gpu memory to inference wh…
-
## Tasks
- [x] Eng Spec @dan-jan
- [ ] Local Models
- [ ] Remote Models (KIV)
- [x] janhq/internal#66
- Functionality
- [x] Each Recommended Model should have default values defin…