VikParuchuri / surya

OCR, layout analysis, reading order, table recognition in 90+ languages
https://www.datalab.to
GNU General Public License v3.0
13.94k stars 864 forks source link

How to run Surya OCR on 8GB or 6GB VRAM NVidea/AMD GPUs #183

Open kkailaasa opened 2 months ago

kkailaasa commented 2 months ago

Hello, I'm interested in using Surya OCR, but I have two systems with less VRAM than the default requirements (> 24 GB VRAM):

From my reading of the project description, I understand that Surya can potentially run with lower VRAM by adjusting batch sizes.

Thank you for your help and for creating Surya OCR.

sharabheshwara commented 2 months ago

@VikParuchuri Hi could you please share some insights on this.

snowfluke commented 2 months ago

@kkailaasa I'm running compiled surya-ocr on RTX 3050 with VRAM 8GB and have a decent speed

newsyh commented 4 weeks ago

@kkailaasa can you tell us how you do ?Thanks

RedwindA commented 3 weeks ago

i'm running compiled surya-ocr on RTX 3050 with VRAM 8GB and have a decent speed

@snowfluke can you tell us how you do ?Thanks

waan1 commented 2 weeks ago

Hello, thanks for the good software. Before putting it into prod use I did a small test (below). I have linux with Nvidia 4090 card (24GB). It takes about 6.2 GB only and when processing saturates one CPU thread to 100% and the GPU shows between 0% and 1% load. One page recognition (recognized text is 19KB) takes 70 seconds. Detection goes fast, but recognition is pretty slow.

Loaded detection model vikp/surya_det3 on device cuda with dtype torch.float16 Loaded recognition model vikp/surya_rec2 on device cuda with dtype torch.float16 Using device: cuda Detecting bboxes: 100%|██████████████████████████████████████████| 1/1 [00:00<00:00, 2.79it/s] Recognizing Text: 100%|██████████████████████████████████████████| 1/1 [01:08<00:00, 68.17s/it]

Is it because recognition step requires more VRAM than I have? If so, can it be configured to use more CPU threads? I have a second (slower) GPU - P40, is it possible to configure it for example detection to use P40 and recognition to use 4090?

from PIL import Image from surya.ocr import run_ocr from surya.model.detection.model import load_model as load_det_model, load_processor as load_det_processor from surya.model.recognition.model import load_model as load_rec_model from surya.model.recognition.processor import load_processor as load_rec_processor import json from PIL import ImageDraw, ImageFont import torch

IMAGE_PATH = "scan.JPEG"

image = Image.open(IMAGE_PATH) langs = ["pl"] # Replace with your languages - optional but recommended det_processor, det_model = load_det_processor(), load_det_model() rec_model, rec_processor = load_rec_model(), load_rec_processor()

Check GPU availability

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}")

Move models to the appropriate device

det_model = det_model.to(device) rec_model = rec_model.to(device)

Add timing for text recognition

start_time = time.time() predictions = run_ocr([image], [langs], det_model, det_processor, rec_model, rec_processor) end_time = time.time()

waan1 commented 1 week ago

resolved git clone for some reason was extremely slow and pip install fixed the issue