Wrong input size at converting and wrong ineference values on popular models (SD, Whisper)

shadowofsoul commented 9 months ago

Hello,

tried to convert some popular models to MNN (stable diffusion, whisper, u2net). all from ONNX to MNN.

u2net worked out of the box. But i'm seeing issues on SD and whisper.

The main problem, is that input size is wrongly calculated. I observe that some dims at the inputs are negatives. example: generates an input_ids: [1,-1] when it should be [1, 77] on the text_encoder of SD. Same happens with whisper, some of the dims of input are -1. When this happen, i get the classic error "Session can't be run why is not resized"

tried to solve this manually by exporting to json and changing the -1 of the dims to the correct value and then convert it back to mnn. in that way, i could execute the model but the ineference result is way off what it should be.

i found an example of SD that works with a chinese model: https://github.com/wangzhaode/mnn-stable-diffusion but that works why the text_encoder is a bert model, not a CLIP model that most SD version uses.

i have tested the ONNX models that are converted from to MNN and they all work, input and outputs are perfect. Also, i set up the dinamic_axis property too just in case at exporting.

I'd like to have some guiance/hints/tips of where it could be the problem and so i can help.

MNN doesn't support whisper, CLIP or AutoEncoders models? why? i'm doing something wrong? Converting from pytorch (and not ONNX) where are functions that are not supported by MNN (like argmax, full etc...) could solve the problem?

i like MNN for the optimization and the low weight that it has on the final binary on mobile, i'd like to expand it supporting this popular models and probably i would try to implement a LCM scheduler to support latency models too to have 4 steps ineference. So any information anywone could give me, can help me out.

shadowofsoul commented 9 months ago

An update to this, i have been able to solve the dinamic axis by doing resizeTensor and resizeSession correctly. but the output of the model is too offset from the original one (and exporting the model with fixed dinamic inputs is even worse). it would be nice why, but i have no idea where to look at (and debug options doesn't give me a hint of where the problem could be).

kmn1024 commented 9 months ago

I too have faced low quality outputs after conversion, but for another application: https://github.com/wangzhaode/mnn-llm/issues/150

It probably comes down to the quantization algorithms available on MNN.

v0jiuqi commented 9 months ago

An update to this, i have been able to solve the dinamic axis by doing resizeTensor and resizeSession correctly. but the output of the model is too offset from the original one (and exporting the model with fixed dinamic inputs is even worse). it would be nice why, but i have no idea where to look at (and debug options doesn't give me a hint of where the problem could be).

For Onnx models, can use "testMNNFromOnnx.py" to review the converted MNN model outputs.

github-actions[bot] commented 7 months ago

Marking as stale. No activity in 60 days.

alibaba / MNN

Wrong input size at converting and wrong ineference values on popular models (SD, Whisper) #2745