Closed CodeWithKyrian closed 5 months ago
Hi @CodeWithKyrian, TransformersPHP looks sweet. It looks like it may be an issue with model (https://github.com/microsoft/onnxruntime/issues/7888). Do you see the same error if you run the models with the same inputs in Python?
Thanks @ankane. I ran the models with the same inputs in Python and it ran smoothly with no errors. What do we suggest I do then given that the issue you mentioned has been open for 3yrs+
Can you share the Python script, as well as the commands used to generate the models? Will try to dig into it when I have some time.
Here's the Python Script in Colab
Then for the ONNX models I use, Xenova has converted many popular models to ONNX and hosted them on the HuggingFace Hub so I use those models in most cases. I only convert myself when there are specific models I need and they haven't been converted. The specific model I used is Xenova/Qwen1.5-0.5B-Chat.
Here's the conversion script Xenova uses to convert the models which uses Optimum under the hood.
Thanks for sharing. However, the Python script doesn't appear to be using the ONNX model (you'll want to use the onnxruntime
package to test it rather than transformers
).
I tried switching the notebook to use the ONNX Runtime model, but unlike transformers
, onnxruntime
lacks built-in features for tokenization, attention masks, and input handling. The required inputs for this seq2seq
model are quite complex (see image).
As you can see from the required inputs (image reference omitted for clarity), it goes beyond basic input_ids and attention_mask. While I attempted to modify the TextGenerationPipeline
to use the ONNX model instead of the pytorch /tf model it's loaded with, seq2seq models like this one present a much bigger challenge.
So I'm not sure I can do anything else at this point to test the python onnxruntime
on this particular tasks 🤦🏽
The goal of running it in Python is to determine if it's an issue with ONNX Runtime PHP or something else. Since you're seeing the error with ONNX Runtime PHP, you should be to take the same model and inputs (you can serialize them to JSON if needed), load them in Python, and try to run it there.
<?php
$model = new OnnxRuntime\Model('model.onnx');
$inputs = ['input_ids' => [...], 'attention_mask' => [...]];
file_put_contents('inputs.json', json_encode($inputs));
$model->predict($inputs); // ERROR: Non-zero status code returned ...
import json
import onnxruntime as ort
with open('inputs.json') as f:
inputs = json.load(f)
sess = ort.InferenceSession('model.onnx')
sess.run(None, inputs)
If it raises the same error, it's likely an issue with either the inputs or how the model was converted to ONNX (in which case you'll want to report it there).
Hello @ankane ,
I hope you're doing well. I'm so sorry for my radio silence since all this time. I've been working through the issue and tried your suggestion but the same error occured on both the Javascript and Python ONNXRuntimes. After looking deeper into the problem, I think I've got a better understanding of what's going on.
Thank you for your support and feedback throughout this process. I appreciate your time and would like to hear your thoughts on this matter.
Hi @CodeWithKyrian, thanks for sharing. I think we could probably add a minimal tensor class to the project to support this. That being said, it's totally fine to distribute a custom version - it just needs to include the license file.
Thanks for the support! I'll make sure to include the license file as required.
For the tensor class, I recommend checking out the Rindow Math Matrix library, particularly the NdArrayPHP
class. Although the library has become a bit complex, the NdArrayPHP
class uses an SplFixedArray
as a 1 dimensional buffer, and the NdArray interface defines the methods necessary for working with tensors - shape()
, size()
, dtype()
, etc.
One approach you might consider is either extending the NdArray
class or defining a similar interface if you'd prefer to avoid adding an extra dependency. By structuring the library to accept tensors that implement this interface, users will have more flexibility and the library will be more extensible.
Let me know if you have any questions or if there's anything else I can do to help!
Now that I think of it, I could contribute to the Tensor class too. I'd try coming up with something and send in a PR hopefully by the end of the week and we'll see how it goes
Hello,
I'd like to report an issue encountered when using this package with large chat models like Qwen1.5-0.5B-Chat and TinyLlama-1.1B-Chat-v1.0.
Background:
I'm using your package as the backbone for my TransformersPHP pacakage (thanks for this 🙏🏽), and it's been working very well for tasks like classification, fill mask, feature extraction, and even basic text generation. After the initial release, I've been working to add support for chat models like the ones I mentioned above.
Issue:
Unfortunately, I've been consistently encountering a non-zero status code error with both models. Here's the exact error message:
For the TinyLlama model, the shapes are reported as 50 by 51. Similarly, for Qwen, the shapes are 22 by 23.
Troubleshooting:
I've debugged the issue and believe the reported numbers might be related to the input shapes. However, my TransformersPHP package already validates input shapes, ensuring all required inputs are provided with the correct dimensions, so I'm unsure where the probelm is originating exactly.
Could you please investigate this issue? Any insights or suggestions on how to resolve the broadcasting error for these large chat models would be greatly appreciated.