-
Mistral's e2e demo perf (with tracing, embedding/argmax on host, untilizing on device) is 15.2 t/s/u.
Device perf is 22.3 t/s/u. e2e:device perf ratio = 68%
Dispatch times for 1 decoder layer are…
-
Options to Reduce the Size of the Transcribing Pods
Objective
This document outlines options to reduce the resource allocation (CPU and memory) of the aws-transcribe-esp pods running in an EKS envir…
-
### Run Information
Name | Value
-- | --
Architecture | x64
OS | ubuntu 22.04
Queue | ViperUbuntu
Baseline | [bb1b43a6a155eaeb45953a3c87f6e04b12979778](https://github.com/dotnet/runtime/commit/bb1…
-
## Decode 128
We are aiming to hit 20 t/s/u end to end for Llama3 decode on t3k. These are the issues left.
![image](https://github.com/user-attachments/assets/ce4a7190-f49d-431b-8f18-79ebb130e36c…
-
### Which jobs are failing?
ci-kubernetes-integration-master
### Which tests are failing?
k8s.io/kubernetes/test/integration: scheduler_perf
=== RUN TestScheduling/TopologySpreading/500Node…
-
React Tiptap Editor is 11MB in size after bundling.
-
-
i use perf -e cpu -a -g to run profiling in machine continously.
what is the overhead for process?
as i know, the perf_event_open will cause user program context switch when counter trigged?
-
### Run Information
Name | Value
-- | --
Architecture | x64
OS | ubuntu 22.04
Queue | TigerUbuntu
Baseline | [078530371f31d23945990b47314f11d6e9f1f7c1](https://github.com/dotnet/runtime/co…
-
evalscope perf --url 'http://127.0.0.1:65000/v1/chat/completions' --parallel 128 --model 'Qwen2-72B-Instruct' --log-every-n-query 10 --read-timeout=120 --dataset-path './data/open_qa.jsonl' -n 1 --max…