New session option to disable default CPU EP fallback session.disable_cpu_ep_fallback
Java
Support for fp16 and bf16 tensors as inputs and outputs, along with utilities to convert between these and fp32 data. On JDK 20 and newer the fp16 conversion methods use the JDK's Float.float16ToFloat and Float.floatToFloat16 methods which can be hardware accelerated and vectorized on some platforms.
Support for external initializers so that large models that can be instantiated without filesystem access
C#
Expose OrtValue API as the new preferred API to run inference in C#. This reduces garbage and exposes direct native memory access via Slice like interfaces.
Make Float16 and BFloat16 full featured fp16 interfaces that support conversion and expose floating properties (e.g. IsNaN, IsInfinity, etc)
C++
Make Float16_t and BFloat16_t full featured fp16 interfaces that support conversion and expose floating properties (e.g. IsNaN, IsInfinity, etc)
Performance
Improve LLM quantization accuracy with smoothquant
Support 4-bit quantization on CPU
Optimize BeamScore to improve BeamSearch performance
Add FlashAttention v2 support for Attention, MultiHeadAttention and PackedMultiHeadAttention ops
Execution Providers
CUDA EP
Initial fp8 support (QDQ, Cast, MatMul)
Relax CUDA Graph constraints to allow more models to utilize
Allow CUDA allocator to be registered with ONNX Runtime externally
TensorRT EP
CUDA Graph support
Support user provided cuda compute stream
Misc bug fixes and improvements
OpenVINO EP
Support OpenVINO 2023.1
QNN EP
Enable context binary cache to reduce initialization time
Support QNN 2.12
Support for resize with asymmetric transformation mode on HTP backend
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps onnxruntime from 1.15.1 to 1.16.1.
Release notes
Sourced from onnxruntime's releases.
... (truncated)
Commits
2a1fd25
Upgrade transformers to fix CI (#17830)c3fd281
Fix onnx quantizer activation and weight type attributef480a36
[hotfix] fix session option access in Node.js binding (#17762)6df4211
Cancel EP check in python for 1.16.1 (#17768)264a740
Cherry-picks for 1.16.1 release (#17741)e7a0495
Cherry-picks pipeline changes to 1.16.0 release branch (#17577)06ea28b
[rel-1.16.0] Cherry-pick 16940 and 17523 (#17506)0772d54
[rel-1.16.0] Cherry-pick 17507 (#17520)a9df3ae
Remove 52 from CMAKE_CUDA_ARCHITECTURES to reduce Nuget package size (#17461)196df08
[rel-1.16.0] Disable QNN QDQ test for release branch (#17463)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show