Dicklesworthstone / llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
2.01k stars 123 forks source link

Taking a lot of time for the sample pdf #12

Open SouravaBehera opened 1 month ago

SouravaBehera commented 1 month ago

I am using a 16 core cpu for the same document using a local model same as in the github repo.

How to get the output faster?

Dicklesworthstone commented 1 month ago

It just works much much better using the API. Highly recommend doing that considering the cost is totally negligible. With the API, each chunk is submitted in parallel in an async way, so it’s incredibly faster.

SouravaBehera commented 1 month ago

can you suggest any such free API Models.

Dicklesworthstone commented 1 month ago

They aren’t free but the cost is extremely low for OpenAI using the model the code is already configured for. We are talking a few cents tops for a document.