devanshkhandekar commented 1 month ago

I am getting very suboptimal results with the cpp onnx version. I have tweaked the config.json multiple times but it is highly inaccurate. What changes do I need to do to get good quality captions ? 115 Screenshot from 2024-08-10 18-46-59

d61h6k4 commented 1 month ago

You can try the florence2-large model. Here is the answer I got from the large model.

Instructions for large model

Download model files and config.json and move them to the florence2/models folder. Then generate gen_config.json from config.json

bazel run //florence2:converter -- --config_json `pwd`/florence2/models/config.json --genai_config_json `pwd`/florence2/models/genai_config.json

Create a vision model with preprocessing

bazel run //florence2:preprocessing_converter -- --original_vision_encoder `pwd`/florence2/models/vision_encoder.onnx --vision_encoder_with_preprocessing `pwd`/florence2/models/vision_encoder_with_preprocessing.onnx

Rescale image to 768x768. Use prompts that authors used to train. https://huggingface.co/microsoft/Florence-2-base/blob/main/processing_florence2.py#L117

devanshkhandekar commented 4 weeks ago

Hey do you have any resource which I can refer to use the onnxruntime-genai for FLorence2 with python for inference ?

d61h6k4 / florence2.cpp

Suboptimal results #1

Instructions for large model