gpt2-inference-performance Search Results

microsoft/onnxruntime #6229

performance of gpt2 inference with onnxruntime-gpu

**Describe the bug** when I load a gpt2 model with onnxruntime-gpu, a lot of warning appeared. It shows that some node will be calculated on CPU. Is this as expected or I made something wrong when c…

carter54 updated 2 years ago

nod-ai/SHARK-TestSuite #325

MiGraphx CPU/GPU Status Tracking

This issue will be used to track compilation failures for migraphx models on CPU and GPU. Compile failures for each model should have a link to an issue with a smaller reproducer in the notes column. …

zjgarvey updated 10 hours ago

golsun/DialogRPT #6

Performance issues with DialogRPT + DialoGPT

Hi again @golsun, I've been working with DialogRPT using DialoGPT-large for dialog generation and have hit some performance issues that aren't present when using just DialoGPT-large. Round trip res…

pablogranolabar updated 3 years ago

ARM-software/armnn #783

Unable to execute GPT-2 onnx model

Hello Team, I am trying to execute the gpt-2 model (link given below) on Mali G710 GPU. During the execution I get the below error, ./ExecuteNetwork -c GpuAcc -f onnx-binary -d /mnt/dropbox/Mobi…

somasundaram1702 updated 1 month ago

dmlc/gluon-nlp #1035

Optimize Inference Performance on CPU

## Description the news in https://github.com/dmlc/gluon-nlp/releases/tag/v0.8.1 shows BERT int8 quantization is presented in blog https://medium.com/apache-mxnet/optimization-for-bert-inference-per…

carter54 updated 4 years ago

microsoft/onnxruntime #13105

[Performance] ONNX Runtime GPT2 Model Running Significantly …

### Describe the issue I was intrigued by @tianleiwu 's [excellent blog post](https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-244357…

dyerdave-gh updated 1 year ago

pytorch/xla #5932

Failing Torchbench Models: tracking issue

## Summary of Contributions (9th Feb) 1) **Improve the number of models in TorchBench that work with Dynamo as a tracer:** These passing rates are now comparable to those from torch.compile using I…

ysiraichi updated 2 weeks ago

mudler/LocalAI #3541

feat: automatically adjust default gpu_layers by available G…

**Is your feature request related to a problem? Please describe.** Having defaults high number of GPU layers doesn't always work. For instance big models can overfit the card and constrain the us…

mudler updated 6 days ago

irthomasthomas/undecidability #822

Measuring inference speed metrics for hosted and local LLM

- [ ] [Measuring inference speed metrics for hosted and local LLM](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client/src/c%2B%2B/perf_analyzer/genai-perf/README.html)…

ShellLM updated 5 months ago

uber-research/PPLM #8

Performances

Thanks for open-sourcing the code ! This approach is very interesting, but I'm curious about the impact on performance (inference speed). **Is there any benchmark showing the impact on performan…

astariul updated 3 years ago

236 results for gpt2-inference-performance

236 results
for gpt2-inference-performance