build: Upgrade onnxruntime-node from 1.15.1 to 1.16.0

This PR was automatically created by Snyk using the credentials of a real user.

Snyk has created this PR to upgrade onnxruntime-node from 1.15.1 to 1.16.0.

:information_source: Keep your dependencies up-to-date. This makes it easier to fix existing vulnerabilities and to more quickly identify and fix newly disclosed vulnerabilities when they affect your project.

- The recommended version is **5 versions** ahead of your current version. - The recommended version was released **22 days ago**, on 2023-09-19.

Release notes

Package name: onnxruntime-node

1.16.0 - 2023-09-19
General
- Support for serialization of models >=2GB
APIs
- New session option to disable default CPU EP fallback session.disable_cpu_ep_fallback
- Java
  - Support for fp16 and bf16 tensors as inputs and outputs, along with utilities to convert between these and fp32 data. On JDK 20 and newer the fp16 conversion methods use the JDK's Float.float16ToFloat and Float.floatToFloat16 methods which can be hardware accelerated and vectorized on some platforms.
  - Support for external initializers so that large models that can be instantiated without filesystem access
- C#
  - Expose OrtValue API as the new preferred API to run inference in C#. This reduces garbage and exposes direct native memory access via Slice like interfaces.
  - Make Float16 and BFloat16 full featured fp16 interfaces that support conversion and expose floating properties (e.g. IsNaN, IsInfinity, etc)
- C++
  - Make Float16_t and BFloat16_t full featured fp16 interfaces that support conversion and expose floating properties (e.g. IsNaN, IsInfinity, etc)
Performance
- Improve LLM quantization accuracy with smoothquant
- Support 4-bit quantization on CPU
- Optimize BeamScore to improve BeamSearch performance
- Add FlashAttention v2 support for Attention, MultiHeadAttention and PackedMultiHeadAttention ops
Execution Providers
- CUDA EP
  - Initial fp8 support (QDQ, Cast, MatMul)
  - Relax CUDA Graph constraints to allow more models to utilize
  - Allow CUDA allocator to be registered with ONNX Runtime externally
- TensorRT EP
  - CUDA Graph support
  - Support user provided cuda compute stream
  - Misc bug fixes and improvements
- OpenVINO EP
  - Support OpenVINO 2023.1
- QNN EP
  - Enable context binary cache to reduce initialization time
  - Support QNN 2.12
  - Support for resize with asymmetric transformation mode on HTP backend
  - Ops support: Equal, Less, LessOrEqual, Greater, GreaterOrEqual, LayerNorm, Asin, Sign, DepthToSpace, SpaceToDepth
  - Support 1D Conv/ConvTranspose
  - Misc bug fixes and improvements
Mobile
- Initial support for Azure EP
- Dynamic shape support for CoreML
- Improve React Native performance with JSI
- Mobile support for CLIPImageProcessor pre-processing and CLIP scenario
- Swift Package Manager support for ONNX Runtime inference and ONNX Runtime extensions via onnxruntime-swift-package-manager
Web
- webgpu ops coverage improvements (SAM, T5, Whisper)
- webnn ops coverage improvements (SAM, Stable Diffusion)
- Stability/usability improvements for webgpu
Large model training
- ORTModule + OpenAI Triton Integration now available. See details here
- Label Sparsity compute optimization support complete and enabled by default starting release 1.16
- New experimental embedding sparsity related optimizations available (disabled by default).
  - Improves training performance of Roberta in Transformers by 20-30%
- Other compute optimizations like Gather/Slice/Reshape upstream support enabled.
- Optimizations for LLaMAv2 (~10% acceleration) and OpenAI Whisper
- Improvements to logging and metrics (initialization overhead, memory usage, statistics convergence tool, etc) system improvements.
- PythonOp enhancement: bool and tuple[bool] constants, materialize grads, empty inputs, save in context, customized shape inference, use full qualified name for export.
- SCELossInternal/SCELossGradInternal CUDA kernels can handle elements more than std::numeric_limits<int32_t>::max.
- Improvements to LayerNorm fusion
- Model cache for exported onnx model is introduced to avoid repeatedly exporting a model that is not changed across.
On-Device Training
- iOS support available starting this release
- Minimal build now available for On-Device Training. Basic binary size ~1.5 MB
- ORT-Extensions custom op support enabled through onnxblock for on-device training scenarios
ORT Extensions

This ORT release is accompanied by updates to onnxruntime-extensions. Features include:
- New Python API gen_processing_models to export ONNX data processing model from Huggingface Tokenizers such as LLaMA , CLIP, XLM-Roberta, Falcon, BERT, etc.
- New TrieTokenizer operator for RWKV-like LLM models, and other tokenizer operator enhancements.
- New operators for Azure EP compatibility: AzureAudioToText, AzureTextToText, AzureTritonInvoker for Python and NuGet packages.
- Processing operators have been migrated to the new Lite Custom Op API
Known Issues
- ORT CPU Python package requires execution provider to be explicitly provided. See #17631. Fix is in progress to be patched.
Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
fs-eire, edgchen1, snnn, pengwa, mszhanyi, PeixuanZuo, tianleiwu, adrianlizarraga, baijumeswani, cloudhan, satyajandhyala, yuslepukhin, RandyShuai, RandySheriffH, skottmckay, Honry, dependabot[bot], HectorSVC, jchen351, chilo-ms, YUNQIUGUO, justinchuby, PatriceVignola, guschmue, yf711, Craigacp, smk2007, RyanUnderhill, jslhcl, wschin, kunal-vaishnavi, mindest, xadupre, fdwr, hariharans29, AdamLouly, wejoncy, chenfucn, pranavsharma, yufenglee, zhijxu-MS, jeffdaily, natke, jeffbloo, liqunfu, wangyems, er3x3, nums11, yihonglyu, sumitsays, zhanghuanrong, askhade, wenbingl, jingyanwangms, ashari4, gramalingam, georgen117, sfatimar, BowenBao, hanbitmyths, stevenlix, jywu-msft
1.16.0-dev.20230908-a9df3aea72 - 2023-09-09
1.16.0-dev.20230820-cbaa008391 - 2023-08-23
1.16.0-dev.20230704-d540c7da0f - 2023-07-05
1.16.0-dev.20230606-f013965831 - 2023-06-07
1.15.1 - 2023-06-17
This release fixed the following issues:
1. A coding problem in test/shared_lib/test_inference.cc that it should use ASSERT_NEAR to test float values instead of ASSERT_EQ. Without this change, some DNNL/OpenVino tests would fail on some AMD CPUs.
2. A misaligned error in cublasGemmBatchedHelper function. The error only occurs when CUDA version = 11.8 and the GPU's CUDA Compute capability >=80. (In other words: with TensorFloat-32 support) (#15981)
3. A build issue that build with onnxruntime_ENABLE_MEMORY_PROFILE was broken in 1.15.0 release. (#16124)
4. Native onnxruntime library not loading in Azure App Service. It is because in 1.15.0 we introduced a Windows API call to SetThreadDescription. Though the API is available in all Windows 10 versions, some sandbox environments block using the API. (#15375)
5. An alignment problem for xnnpack EP on Intel/AMD CPUs on PC platforms.
6. Some training header files were missing in the 1.15.0 training nuget package.
7. Some fields in OrtCUDAProviderOptionsV2 struct are not initialized
8. The *.dylib files in ONNX Runtime nuget package are not signed. (#16168)
Known issue
1. Segfaults when loading model with local functions, works fine if model is inlined by ONNX (#16170)
2. Cross building for iOS requires manually downloading protoc (#16238)
```
  </li>
</ul>
from <a href="https://snyk.io/redirect/github/Microsoft/onnxruntime/releases">onnxruntime-node GitHub release notes</a>
```

Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open upgrade PRs.

For more information:

🧐 View latest project report

🛠 Adjust upgrade PR settings

🔕 Ignore this dependency or unsubscribe from future upgrade PRs

Anush008 / fastembed-js

build: Upgrade onnxruntime-node from 1.15.1 to 1.16.0 #6

Snyk has created this PR to upgrade onnxruntime-node from 1.15.1 to 1.16.0.

General

APIs

Performance

Execution Providers

Mobile

Web

Large model training

On-Device Training

ORT Extensions

Known Issues

Contributions

Known issue