-
### 🐛 Describe the bug
At main - `python benchmarks/dynamo/timm_models.py --performance --cold-start-latency --inference --bfloat16 --backend inductor --disable-cudagraphs --device cuda --only=pnasne…
-
Hello!
I've successfully ran my yolov8 model on Deepstream according to the recommendations of this repository. Nevertheless I'm now having trouble to understand how to extract metadata (https://gi…
-
[Jlama](https://github.com/tjake/Jlama) is a fast modern Java library for running many LLMs.
Jlama is built on Java 21 and utilizes the [Panama Vector API](https://openjdk.org/jeps/448) for fast infe…
-
The curator calls the tree inference method `"Tree type"`. [Nexsons](https://github.com/OpenTreeOfLife/phylesystem-api/wiki/NexSON#table-i-predicate-vocabulary) call this `"ot:curatedType"`.
The nex…
-
### Feature request
Is it possible to infer the model separately through encoder.onnx and decoder.onnx
### Motivation
Is it possible to infer the model separately through encoder.onnx and decoder.o…
-
Llama.cpp now supports distribution across multiple devices to boost speeds, this would be a great addition to Ollama
https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc
https://www.re…
-
For the example in this page: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#usage
You can easily inference on images:
python vlm_demo_new.py \
--model-path VILA1.5-13b-AWQ \
…
-
Will require some core changes to how distributed inference works, hence higher bounty of $500.
This would be a great contribution to exo.
-
Unable to install forge using the Advanced installation guide. (Install over Automatic) Another user posted this issue on https://github.com/continue-revolution/sd-webui-animatediff/issues/549, but I …
-
Tried the [instructions](https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#create-a-virtual-environment) on Windows, but all they do is give failures.
```
> pip install …