Bump onnxruntime from 1.2.0 to 1.6.0

Bumps onnxruntime from 1.2.0 to 1.6.0.

Release notes

ONNX Runtime v1.6.0

Announcements

OpenMP will be disabled in future official builds (build option will still be available). A NoOpenMP version of ONNX Runtime is now available with this release on Nuget and PyPi for C/C++/C#/Python users.

In the next release, MKL-ML, openblas, and jemallac build options will be removed, and the Microsoft.ML.OnnxRuntime.MKLML Nuget package will no longer be published. Users of MKL-ML are recommended to use the Intel EPs. If you are using these options and identify issues switching to an alternative build, please file an issue with details.

Key Feature Updates

General

ONNX 1.8 support / opset 13

New contrib ops: BiasSoftmax, MatMulIntegerToFloat, QLinearSigmoid, Trilu

ORT Mobile now compatible with NNAPI for accelerating model execution on Android devices

Build support for Mac with Apple Silicon (CPU only)

New dependency: flatbuffers

Support for loading sparse tensor initializers in pruned models

Support for setting the execution priority of a node

Support for selection of cuDNN conv algorithms

BERT Model profiling tool

Performance

New session option to disable denormal floating numbers on sse3 supporting CPUs

Eliminates unexpected performance degradation due to denormals without needing to retrain the model

Option to share initializers between sessions to improve memory utilization

Useful when several models that use the same set of initializers except the last few layers of the model are loaded in the same process

Eliminates wasteful memory usage when every model (session) creates a separate instance of the same initializer

Exposed by the AddInitializer API

Transformer model optimizations

Longformer: LongformerAttention CUDA operator added

Support for BERT models exported from Tensorflow with 1 or 2 inputs

Python optimizer supports additional models: openai-GPT, ALBERT and FlauBERT

Quantization

Support of per-channel QuantizeLinear and DeQuantizeLinear

Support of LSTM quantization

Quantization performance improvement on ARM

CNN quantization perf optimizations, including u8s8 support and NHWC transformer in QLinearConv

QuantizationQLinearConv optimization with model transform

ThreadPool

Use _mm_pause() for spin loop to improve performance and power consumption

APIs and Packages

Python - I/O Binding enhancements

Usage Documentation (OrtValue and IOBinding sections)

Python binding for the OrtValue data structure

An interface is exposed to allocate memory on a CUDA-supported device and define the contents of this memory. No longer need to use allocators provided by other libraries to allocate and manage CUDA memory to be used with ORT.

Allows consuming ORT allocated device memory as an OrtValue (check Scenario 4 in the IOBinding section of the documentation for an example)

OrtValue instances can be used to bind inputs/outputs. This is in addition to existing interfaces that allows binding a piece of memory directly/using numpy arrays that can be bound and may be particularly useful when binding ORT allocated device memory.

C# - float16 and bfloat16 support

Windows ML

NuGet package now supports UWP applications targeting Windows Store deployment for both CPU and GPU

Minor API Improvements:

Able to bind IIterable as inputs and outputs

Able to create Tensor* via multiple buffers

Commits

718ca7f Second round of cherry-pick (#6083)
c38f762 work around of the build break in mac (#6069)
d19ad2c Regenerate CI build docker images
cbb4e5d Update cgmanifest.json and onnx submodule
c3f6a1e Add missing file
fdebbee Fix build.py bug which prevents running some unit tests (#5990)
455a6b8 Revert python 3.8 changes
3013289 update onnx submuole to 994c6181247d7b419b28889fc57d5817e2089419 (#6042)
095d55b Cherry picking for Rel-1.6 (#6006)
846c5fb Report arm64 minimal baseline binary size only for continuous integration (#5...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com): - Update frequency (including time of day and day of week) - Pull request limits (per update run and/or open at any time) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired)

QDucasse / nn_benchmark

Bump onnxruntime from 1.2.0 to 1.6.0 #12

ONNX Runtime v1.6.0

Announcements

Key Feature Updates

General

Performance

APIs and Packages