-
### 🐛 Describe the bug
Out of memory in weekly test, https://github.com/intel/torch-xpu-ops/actions/runs/10218591763
Model list:
- [ ] `GPTJForCausalLM`
- [ ] `GPTJForQuestionAnswering`
- [ ]…
-
### What happened?
For the attached IR seeing abort during runtime.
command:
```
iree-compile model.modified.mlir --iree-hal-target-backends=llvm-cpu -o compiled_model.vmfb
iree-run-module --modul…
-
Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia …
-
Hi,
The `latency` variable is used, but can be undefined:
Initialization happens only under this condition:
```
if chunk == "[DONE]":
latency = time.perf_counter() - st
```
But it gets ref…
-
hi,
what about the inference time of the FD_MobileNet ? can you give benchmark of speed about different models mentationed in the paper ? Thanks !
-
- Q3 Collaboration Plan of Infra and IaaS Labs: https://bytedance.us.larkoffice.com/docx/HKXfdRh1noMrbAxcgL2ureGasdQ
- FlashDecoding++ Summary: https://bytedance.larkoffice.com/wiki/WbqXwRL3qi0x18k…
-
flashdecoding++ paper: https://arxiv.org/abs/2311.01282
- Q3 Collaboration Plan of Infra and IaaS Labs: https://bytedance.us.larkoffice.com/docx/HKXfdRh1noMrbAxcgL2ureGasdQ
- FlashDecoding++ Su…
-
Hi I managed to build GLIP and use the following command to test zero-shot on COCO
`python tools/test_grounding_net.py --config-file configs/pretrain/glip_Swin_L.yaml --weight MODEL/glip_large_mode…
-
**What would you like to be added/modified**:
Based on the current multiedge inference benchmark on ianvs, we would like to extend the multiedge inference on multiple heterogeneous edges (e.g., m…
-
Dears,
I failed to run Llama-2-7b-chat-hf on NPU, please give me a hand.
1. I converted the mode by below command, and got two models,
a) optimum-cli export openvino --task text-generation -m Meta-…