PlutosCapstone / Plutos

SE Capstone
MIT License
0 stars 0 forks source link

peer-review[team 23]: OCR system accuracy #38

Open aarnphm opened 1 week ago

aarnphm commented 1 week ago

Artifact Under Review

Team Number for Team Doing the Review

23

Description of Issue

Hey, kudos to the team for the first revision of the SRS. I'm excited to see where this is heading.

I have one question regarding the OCR system accuracy. The team mentioned using Tesseract or a third-party API for this functionality, and requirement NFR4 mandates accuracy received from this OCR pipeline to be 90%.

These OCR pipelines handle PDF files relatively well, but I have yet to know of any cases in which it can do receipt. This has to do with during text segmentation, certain words and lines were not captured (low image quality, blurred text, text can be very long)

How would the team address this problem? Would it be more suitable to implement an OCR pipeline using open-source models and fine-tuning it on certain sets of receipts to achieve similar accuracy?