Receipt-Wrangler / receipt-wrangler-api

Server for Receipt Wrangler
GNU Affero General Public License v3.0
82 stars 1 forks source link

Possible to NOT use OCR? #276

Closed joestump closed 3 months ago

joestump commented 3 months ago

Is it possible to skip OCR entirely and just use the file uploaded directly with OpenAI?

joestump commented 3 months ago

The OCR being returned on any photos of a physical receipt is pretty bad:

oie Py aa oe e Lily Cla ‘ yt Ne ’ od a YAS rh ’ 25. Layey a ‘ Doe SS aOR ee RSE SRC ie ea Ce ges 02 sabe Bhat 2 7 Nigh ANA ope nets Oe. eles SEES uaa tat rwury fa sae . RA Pa Ss ee é ¢ J WS RITA Bea Bg nore Tea re Swe Mea Se Se ok 8 ae REE GR ee Aaa ae 

When I try the EasyOCR I get: Receipt data from AI: signal: illegal instruction (core dumped)

Would prefer to just let OpenAI read the text in the image as well as extracting information from the image.

Noah231515 commented 3 months ago

I have been experimenting with the viability of using vision models/multi modal models for Receipt Wrangler recently. It is definitely a lot easier than using OCR, and the whole process is much faster too.

I'll mark this as a feature request, assuming it works well enough it will get implemented in Receipt Wrangler. I'll also check out what's going on with EasyOCR.

For using OCR in general, the photos need to be as clear as they can possibly be for good results. Tesseract in particular works the best on black and white images, with minimal receipt wrinkles and all of that. Receipt Wrangler does pre process every image before performing OCR, such as de-skewing (straightening image), converting to black and white and removing noise, so the input image doesn't have to be absolutely perfect. But better quality still helps.

Noah231515 commented 3 months ago

Support for OpenAI Vision has been added, make sure to update your container(s) first. Check out https://receiptwrangler.io/docs/concepts/system-settings/receipt-processing-settings#managing-open-aigemini-receipt-processing-settings to learn more.

Just need to pick the model you would like to use in the Receipt Processing Settings, either gpt-4o or gpt-4o-mini, then check the "Use Vision?" Checkbox and you'll be good to go.