CodeWithKyrian / transformers-php

Transformers PHP is a toolkit for PHP developers to add machine learning magic to their projects easily.
https://codewithkyrian.github.io/transformers-php/
Apache License 2.0
291 stars 16 forks source link

Custom inference session for improved ONNX model handling #24

Closed CodeWithKyrian closed 3 months ago

CodeWithKyrian commented 3 months ago

What:

Description:

This PR introduces a custom inference session class that improves how ONNX inference is handled within TransformersPHP. Previously, the original inference session from ankane/onnnxruntime-php processed inputs and outputs as flat arrays. Since TransformersPHP primarily works with tensors, this created extra conversion steps.

Here's how the original process worked:

With the new custom inference session, these unnecessary conversions and overheads are eliminated, conserving memory and improving performance. Now, the session accepts a tensor as input and returns a tensor as output, streamlining the process to:

Additionally, the custom inference session resolves an issue with zero-sized tensors. The previous approach struggled with zero-sized arrays, losing contextual information about their shape. By working directly with tensors, the new inference session retains shape information even for zero-sized tensors. This allows for accurate memory allocation and shape management, eliminating the need for manual adjustments for attention masks in decoder models.

In summary, this PR optimizes the handling of inputs and outputs in ONNX inference sessions, reducing conversion overhead and improving memory management and performance for larger model inputs and outputs.