Dicklesworthstone / llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
2.14k stars 142 forks source link

Alternative offline LLMs #4

Open adnanPBI opened 5 months ago

adnanPBI commented 5 months ago

Hi, Your code used llma2 chat offline LLM. But, I wanted to use alternative offline LLMs such as huggingface's distilbert or roberta or albert. Do you have any suggestion for those LLMs to apply on python base?

Backendmagier commented 2 months ago

i think you would just need to update the download_models function to download the model you want... and if it is not a Llama model then you need to change the get_tokenizer function and lot the right tokenizer

Shahin-rmz commented 1 month ago

get_tokenizer function and lot the right tokenizer

Hi, yes I like to use my own local LLM, but it is a bit tricky. I just need to adjust the tokenizer ?