ankane / onnxruntime-php

Run ONNX models in PHP
MIT License
48 stars 5 forks source link

Non-zero status code returned while running Where node. #4

Closed CodeWithKyrian closed 5 months ago

CodeWithKyrian commented 5 months ago

Hello,

I'd like to report an issue encountered when using this package with large chat models like Qwen1.5-0.5B-Chat and TinyLlama-1.1B-Chat-v1.0.

Background:

I'm using your package as the backbone for my TransformersPHP pacakage (thanks for this 🙏🏽), and it's been working very well for tasks like classification, fill mask, feature extraction, and even basic text generation. After the initial release, I've been working to add support for chat models like the ones I mentioned above.

Issue:

Unfortunately, I've been consistently encountering a non-zero status code error with both models. Here's the exact error message:

 Non-zero status code returned while running Where node. Name:'/model/Where_4' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:540 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 50 by 51

For the TinyLlama model, the shapes are reported as 50 by 51. Similarly, for Qwen, the shapes are 22 by 23.

Troubleshooting:

I've debugged the issue and believe the reported numbers might be related to the input shapes. However, my TransformersPHP package already validates input shapes, ensuring all required inputs are provided with the correct dimensions, so I'm unsure where the probelm is originating exactly.

Could you please investigate this issue? Any insights or suggestions on how to resolve the broadcasting error for these large chat models would be greatly appreciated.

ankane commented 5 months ago

Hi @CodeWithKyrian, TransformersPHP looks sweet. It looks like it may be an issue with model (https://github.com/microsoft/onnxruntime/issues/7888). Do you see the same error if you run the models with the same inputs in Python?

CodeWithKyrian commented 5 months ago

Thanks @ankane. I ran the models with the same inputs in Python and it ran smoothly with no errors. What do we suggest I do then given that the issue you mentioned has been open for 3yrs+

ankane commented 5 months ago

Can you share the Python script, as well as the commands used to generate the models? Will try to dig into it when I have some time.

CodeWithKyrian commented 5 months ago

Here's the Python Script in Colab

Then for the ONNX models I use, Xenova has converted many popular models to ONNX and hosted them on the HuggingFace Hub so I use those models in most cases. I only convert myself when there are specific models I need and they haven't been converted. The specific model I used is Xenova/Qwen1.5-0.5B-Chat.

Here's the conversion script Xenova uses to convert the models which uses Optimum under the hood.

ankane commented 5 months ago

Thanks for sharing. However, the Python script doesn't appear to be using the ONNX model (you'll want to use the onnxruntime package to test it rather than transformers).

CodeWithKyrian commented 5 months ago

I tried switching the notebook to use the ONNX Runtime model, but unlike transformers, onnxruntime lacks built-in features for tokenization, attention masks, and input handling. The required inputs for this seq2seq model are quite complex (see image).

image

As you can see from the required inputs (image reference omitted for clarity), it goes beyond basic input_ids and attention_mask. While I attempted to modify the TextGenerationPipeline to use the ONNX model instead of the pytorch /tf model it's loaded with, seq2seq models like this one present a much bigger challenge.

So I'm not sure I can do anything else at this point to test the python onnxruntime on this particular tasks 🤦🏽

ankane commented 5 months ago

The goal of running it in Python is to determine if it's an issue with ONNX Runtime PHP or something else. Since you're seeing the error with ONNX Runtime PHP, you should be to take the same model and inputs (you can serialize them to JSON if needed), load them in Python, and try to run it there.

<?php

$model = new OnnxRuntime\Model('model.onnx');

$inputs = ['input_ids' => [...], 'attention_mask' => [...]];
file_put_contents('inputs.json', json_encode($inputs));

$model->predict($inputs); // ERROR: Non-zero status code returned ...
import json
import onnxruntime as ort

with open('inputs.json') as f:
    inputs = json.load(f)

sess = ort.InferenceSession('model.onnx')
sess.run(None, inputs)

If it raises the same error, it's likely an issue with either the inputs or how the model was converted to ONNX (in which case you'll want to report it there).

CodeWithKyrian commented 4 months ago

Hello @ankane ,

I hope you're doing well. I'm so sorry for my radio silence since all this time. I've been working through the issue and tried your suggestion but the same error occured on both the Javascript and Python ONNXRuntimes. After looking deeper into the problem, I think I've got a better understanding of what's going on.

Summary of Findings

My Approach

Impact and Concerns

Thank you for your support and feedback throughout this process. I appreciate your time and would like to hear your thoughts on this matter.

ankane commented 4 months ago

Hi @CodeWithKyrian, thanks for sharing. I think we could probably add a minimal tensor class to the project to support this. That being said, it's totally fine to distribute a custom version - it just needs to include the license file.

CodeWithKyrian commented 4 months ago

Thanks for the support! I'll make sure to include the license file as required.

For the tensor class, I recommend checking out the Rindow Math Matrix library, particularly the NdArrayPHP class. Although the library has become a bit complex, the NdArrayPHP class uses an SplFixedArray as a 1 dimensional buffer, and the NdArray interface defines the methods necessary for working with tensors - shape(), size(), dtype(), etc.

One approach you might consider is either extending the NdArray class or defining a similar interface if you'd prefer to avoid adding an extra dependency. By structuring the library to accept tensors that implement this interface, users will have more flexibility and the library will be more extensible.

Let me know if you have any questions or if there's anything else I can do to help!

CodeWithKyrian commented 4 months ago

Now that I think of it, I could contribute to the Tensor class too. I'd try coming up with something and send in a PR hopefully by the end of the week and we'll see how it goes