-
### Motivation
This is an interesting blog post [FireAttention V2: 12x faster to make Long Contexts practical for Online Inference](https://fireworks.ai/blog/fireattention-v2-long-context-inference…
-
### 📚 The doc issue
there is a typo: ```A larger batch size means a higher throughput at the cost of lower latency.```
correct version should be: ```A larger batch size means a higher throughput a…
-
FOR ASCEND TORCH_NPU BACKEND:
By the following configuration, private conv format is not allowed, which reduces format conversion and optimizes the speed of the conv operator. It can also avoid the …
-
### Feature Idea
Found this comment by @Exploder98 suggesting removing bfloat16 which increased my speed by 50%, modifying
`supported_inference_dtypes = [torch.bfloat16, torch.float16, torch.floa…
-
**What would you like to be added/modified**:
Sedna is an edge-cloud synergy AI project incubated in KubeEdge SIG AI. Benefiting from the edge-cloud synergy capabilities provided by KubeEdge, Sed…
-
### Type
new chapter
### Chapter/Page
Something else
### Description
Doing training or inference models are fairly easy, when we have smaller number of parameters. But when the scale of…
-
I want to konw:
1. support model list (a table includs hardware, backend, model name, dtype, optimization techs. )
2. does the project support service with concurrency? that is, if many client sent …
-
![image](https://github.com/user-attachments/assets/7c11d357-35b5-4b69-8cfd-f3f4112fcd4c)
As I shows in the picture, all outputs of inference is "!".
I tried different approaches and found that I …
-
### Content Type
Article
### Article Description
- How to set up and configure containers for GPU-intensive tasks such as LLM inference or fine-runing.
- Demo project as example and proof of…
nkkko updated
2 weeks ago
-
### OpenVINO Version
openvino : 2024.3.0
### Operating System
Windows System
### Device used for inference
iGPU
### OpenVINO installation
PyPi
### Programming Language
Python
### Hardware Ar…