Bump onnxruntime-gpu from 1.8.1 to 1.9.0

Bumps onnxruntime-gpu from 1.8.1 to 1.9.0.

Release notes

Sourced from onnxruntime-gpu's releases.

ONNX Runtime v1.9.0

Announcements

GCC version < 7 is no longer supported

CMAKE_SYSTEM_PROCESSOR needs be set when cross-compiling on Linux because pytorch cpuinfo was introduced as a dependency for ARM big.LITTLE support. Set it to the value of uname -m output of your target device.

General

ONNX 1.10 support

opset 15

ONNX IR 8 (SparseTensor type, model local functionprotos, Optional type not yet fully supported this release)

Improved documentation of C/C++ APIs

IBM Power support

WinML - DLL dependency fix supports learning models on Windows 8.1

Support for sub-building onnxruntime-extensions and statically linking into onnxruntime binary for custom builds

Add --_use_extensions option to run models with custom operators implemented in onnxruntime-extensions

APIs

Registration of a custom allocator for sharing between multiple sessions. (See RegisterAllocator and UnregisterAllocator APIs in onnxruntime_c_api.h)

SessionOptionsAppendExecutionProvider_TensorRT API is deprecated; use SessionOptionsAppendExecutionProvider_TensorRT_V2

New APIs: SessionOptionsAppendExecutionProvider_TensorRT_V2, CreateTensorRTProviderOptions, UpdateTensorRTProviderOptions, GetTensorRTProviderOptionsAsString, ReleaseTensorRTProviderOptions, EnableOrtCustomOps, RegisterAllocator, UnregisterAllocator, IsSparseTensor, CreateSparseTensorAsOrtValue, FillSparseTensorCoo, FillSparseTensorCsr, FillSparseTensorBlockSparse, CreateSparseTensorWithValuesAsOrtValue, UseCooIndices, UseCsrIndices, UseBlockSparseIndices, GetSparseTensorFormat, GetSparseTensorValuesTypeAndShape, GetSparseTensorValues, GetSparseTensorIndicesTypeShape, GetSparseTensorIndices,

Performance and quantization

Performance improvement on ARM

Added S8S8 (signed int8, signed int8) matmul kernel. This avoids extending uin8 to int16 for better performance on ARM64 without dot-product instruction

Expanded GEMM udot kernel to 8x8 accumulator

Added sgemm and qgemm optimized kernels for ARM64EC

Operator improvements

Improved performance for quantized operators: DynamicQuantizeLSTM, QLinearAvgPool

Added new quantized operator QGemm for quantizing Gemm directly

Fused HardSigmoid and Conv

Quantization tool - subgraph support

Transformers tool improvements

Fused Attention for BART encoder and Megatron GPT-2

Integrated mixed precision ONNX conversion and parity test for GPT-2

Updated graph fusion for embed layer normalization for BERT

Improved symbolic shape inference for operators: Attention, EmbedLayerNormalization, Einsum and Reciprocal

Packages

Official ORT GPU packages (except Python) now include both CUDA and TensorRT Execution Providers.

Python packages will be updated next release. Please note that EPs should be explicitly registered to ensure the correct provider is used.

GPU packages are built with CUDA 11.4 and should be compatible with 11.x on systems with the minimum required driver version. See: CUDA minor version compatibility

Pypi

ORT + DirectML Python packages now available: onnxruntime-directml

GPU package can be used on both CPU-only and GPU machines

Nuget

C#: Added support for using netstandard2.0 as a target framework

Windows symbol (PDB) files are now contained in a separate Nuget package, reduces size of the binary Nuget package by 85%

Execution Providers

CUDA EP

... (truncated)

Commits

6e83392 Bump up TVM version to avoid conflict with existing one (#9159)
4934455 Bumping up to 1.10 (#9006)
4e5bc83 Add Paddle2ONNX to Versioning.md (#9067)
267fb89 Added code to support Softmaxgrad for DNNL EP (#9022)
1db21da Replaced onnx build with pypi installation (#9139)
153767b Add internal determinism flag configuration for ORTModule (#9074)
b175f98 Do not generate nuget symbol packages on Linux (#9131)
4df94a6 [NNAPI EP] Fix MaxPool error using uint8 (#9129)
f7dedc9 Fix default initialization value in C API header (#9126)
02b9213 Fix a bug for Openvino Python binding (#9130)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

bit-bots / YOEO

Bump onnxruntime-gpu from 1.8.1 to 1.9.0 #11

ONNX Runtime v1.9.0

Announcements

General

APIs

Performance and quantization

Packages

Execution Providers