Dicklesworthstone / llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
2.16k stars 143 forks source link

Support APIs #3

Open ayoubelmhamdi opened 11 months ago

ayoubelmhamdi commented 11 months ago

Is there any plan to restructure the code to be uniform to use it with Llama2/API like (gpt-3.5-turbo, gpt-4) to use this PDF-to-text in any hardware.

https://github.com/Dicklesworthstone/llama2_aided_tesseract/blob/5719a9aede6b0666f6f08d239cac7b1550298b79/tesseract_with_llama2_corrections.py#L180 https://github.com/Dicklesworthstone/llama2_aided_tesseract/blob/5719a9aede6b0666f6f08d239cac7b1550298b79/tesseract_with_llama2_corrections.py#L122 https://github.com/Dicklesworthstone/llama2_aided_tesseract/blob/5719a9aede6b0666f6f08d239cac7b1550298b79/tesseract_with_llama2_corrections.py#L173

Dicklesworthstone commented 11 months ago

Not really, but you’re welcome to submit a PR.

On Fri, Nov 24, 2023 at 2:42 PM AYOUB EL MHAMDI @.***> wrote:

Is there any plan to restructure the code to be uniform to use it with Llama2/API like (gpt-3.5-turbo, gpt-4) to use this PDF-to-text in any hardware.

https://github.com/Dicklesworthstone/llama2_aided_tesseract/blob/5719a9aede6b0666f6f08d239cac7b1550298b79/tesseract_with_llama2_corrections.py#L180

https://github.com/Dicklesworthstone/llama2_aided_tesseract/blob/5719a9aede6b0666f6f08d239cac7b1550298b79/tesseract_with_llama2_corrections.py#L122

https://github.com/Dicklesworthstone/llama2_aided_tesseract/blob/5719a9aede6b0666f6f08d239cac7b1550298b79/tesseract_with_llama2_corrections.py#L173

— Reply to this email directly, view it on GitHub https://github.com/Dicklesworthstone/llama2_aided_tesseract/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILNF3RZWGAOQRXHQXFLJ3TYGD2CJAVCNFSM6AAAAAA7ZQJJSKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTAMJWGI4TOMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ayoubelmhamdi commented 11 months ago

I have not much experience in threads and embedding, but it would be cool to implement them someday.