Non-zero status code returned while running Where node.

CodeWithKyrian commented 5 months ago

Hello,

I'd like to report an issue encountered when using this package with large chat models like Qwen1.5-0.5B-Chat and TinyLlama-1.1B-Chat-v1.0.

Background:

I'm using your package as the backbone for my TransformersPHP pacakage (thanks for this 🙏🏽), and it's been working very well for tasks like classification, fill mask, feature extraction, and even basic text generation. After the initial release, I've been working to add support for chat models like the ones I mentioned above.

Issue:

Unfortunately, I've been consistently encountering a non-zero status code error with both models. Here's the exact error message:

 Non-zero status code returned while running Where node. Name:'/model/Where_4' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:540 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 50 by 51

For the TinyLlama model, the shapes are reported as 50 by 51. Similarly, for Qwen, the shapes are 22 by 23.

Troubleshooting:

I've debugged the issue and believe the reported numbers might be related to the input shapes. However, my TransformersPHP package already validates input shapes, ensuring all required inputs are provided with the correct dimensions, so I'm unsure where the probelm is originating exactly.

Could you please investigate this issue? Any insights or suggestions on how to resolve the broadcasting error for these large chat models would be greatly appreciated.

ankane commented 5 months ago

Hi @CodeWithKyrian, TransformersPHP looks sweet. It looks like it may be an issue with model (https://github.com/microsoft/onnxruntime/issues/7888). Do you see the same error if you run the models with the same inputs in Python?

CodeWithKyrian commented 5 months ago

Thanks @ankane. I ran the models with the same inputs in Python and it ran smoothly with no errors. What do we suggest I do then given that the issue you mentioned has been open for 3yrs+

ankane commented 5 months ago

Can you share the Python script, as well as the commands used to generate the models? Will try to dig into it when I have some time.

CodeWithKyrian commented 5 months ago

Here's the Python Script in Colab

Then for the ONNX models I use, Xenova has converted many popular models to ONNX and hosted them on the HuggingFace Hub so I use those models in most cases. I only convert myself when there are specific models I need and they haven't been converted. The specific model I used is Xenova/Qwen1.5-0.5B-Chat.

Here's the conversion script Xenova uses to convert the models which uses Optimum under the hood.

ankane commented 5 months ago

Thanks for sharing. However, the Python script doesn't appear to be using the ONNX model (you'll want to use the onnxruntime package to test it rather than transformers).

CodeWithKyrian commented 5 months ago

I tried switching the notebook to use the ONNX Runtime model, but unlike transformers, onnxruntime lacks built-in features for tokenization, attention masks, and input handling. The required inputs for this seq2seq model are quite complex (see image).

As you can see from the required inputs (image reference omitted for clarity), it goes beyond basic input_ids and attention_mask. While I attempted to modify the TextGenerationPipeline to use the ONNX model instead of the pytorch /tf model it's loaded with, seq2seq models like this one present a much bigger challenge.

So I'm not sure I can do anything else at this point to test the python onnxruntime on this particular tasks 🤦🏽

ankane commented 5 months ago

The goal of running it in Python is to determine if it's an issue with ONNX Runtime PHP or something else. Since you're seeing the error with ONNX Runtime PHP, you should be to take the same model and inputs (you can serialize them to JSON if needed), load them in Python, and try to run it there.

<?php

$model = new OnnxRuntime\Model('model.onnx');

$inputs = ['input_ids' => [...], 'attention_mask' => [...]];
file_put_contents('inputs.json', json_encode($inputs));

$model->predict($inputs); // ERROR: Non-zero status code returned ...

import json
import onnxruntime as ort

with open('inputs.json') as f:
    inputs = json.load(f)

sess = ort.InferenceSession('model.onnx')
sess.run(None, inputs)

If it raises the same error, it's likely an issue with either the inputs or how the model was converted to ONNX (in which case you'll want to report it there).

CodeWithKyrian commented 4 months ago

Hello @ankane ,

I hope you're doing well. I'm so sorry for my radio silence since all this time. I've been working through the issue and tried your suggestion but the same error occured on both the Javascript and Python ONNXRuntimes. After looking deeper into the problem, I think I've got a better understanding of what's going on.

Summary of Findings

The issue stems from the handling of past key values and attention mask inputs, specifically when the model requires a tensor with one of the dimensions being 0.
For models like Qwen 2 and Llama, the past_sequence_length is initially 0 in the first iteration, which leads to issues since the library derives the shape from the shape of the input array.

My Approach

To address the zero-dimension limitation, I created a modified version of the InferenceSession to accept tensor inputs directly. My library already has a Tensor implementation (that uses a flat C buffer), and has contextual information about the shape.
This modification allows me to create a void pointer for tensors of size 0(ie one of the dimensions is 0, and it has an empty buffer), but still pass the required shape for the model.
Additionally, since my Tensor using a C buffer, the step of converting from PHP array to Onnx Tensor was eliminated, as I could copy data to and from the ONNX Runtime inputs and ouputs using FFI::memcpy easily.

Impact and Concerns

This approach has really helped streamline my workflow, but it involves using my own custom edit of the ONNX Runtime PHP library that incorporates my Tensor implementation directly.
I wanted to loop you in about these changes and see if you have any concerns about potential conflicts with the library's designs, or if it violates any guidelines. Simply put, this edit means I might not use the library directly anymore, but I'll be using a custom edit from the library files, and I'd like to know if there are any concerns.
I'm very open to including a disclaimer in my edits to recognize the source of inspiration from the ONNX Runtime PHP library.

Thank you for your support and feedback throughout this process. I appreciate your time and would like to hear your thoughts on this matter.

ankane commented 4 months ago

Hi @CodeWithKyrian, thanks for sharing. I think we could probably add a minimal tensor class to the project to support this. That being said, it's totally fine to distribute a custom version - it just needs to include the license file.

CodeWithKyrian commented 4 months ago

Thanks for the support! I'll make sure to include the license file as required.

For the tensor class, I recommend checking out the Rindow Math Matrix library, particularly the NdArrayPHP class. Although the library has become a bit complex, the NdArrayPHP class uses an SplFixedArray as a 1 dimensional buffer, and the NdArray interface defines the methods necessary for working with tensors - shape(), size(), dtype(), etc.

One approach you might consider is either extending the NdArray class or defining a similar interface if you'd prefer to avoid adding an extra dependency. By structuring the library to accept tensors that implement this interface, users will have more flexibility and the library will be more extensible.

Let me know if you have any questions or if there's anything else I can do to help!

CodeWithKyrian commented 4 months ago

Now that I think of it, I could contribute to the Tensor class too. I'd try coming up with something and send in a PR hopefully by the end of the week and we'll see how it goes

ankane / onnxruntime-php