Open devanshkhandekar opened 1 month ago
You can try the florence2-large model. Here is the answer I got from the large model.
Download model files and config.json and move them to the florence2/models
folder.
Then generate gen_config.json from config.json
bazel run //florence2:converter -- --config_json `pwd`/florence2/models/config.json --genai_config_json `pwd`/florence2/models/genai_config.json
Create a vision model with preprocessing
bazel run //florence2:preprocessing_converter -- --original_vision_encoder `pwd`/florence2/models/vision_encoder.onnx --vision_encoder_with_preprocessing `pwd`/florence2/models/vision_encoder_with_preprocessing.onnx
Rescale image to 768x768. Use prompts that authors used to train. https://huggingface.co/microsoft/Florence-2-base/blob/main/processing_florence2.py#L117
Hey do you have any resource which I can refer to use the onnxruntime-genai for FLorence2 with python for inference ?
I am getting very suboptimal results with the cpp onnx version. I have tweaked the config.json multiple times but it is highly inaccurate. What changes do I need to do to get good quality captions ?