Open raveslave opened 4 years ago
Hi @raveslave ,
Thanks for sharing this idea. It's really interesting and looks really cool.
I have a few doubts though about the usability as it seems rather complicated to develop or even to use.
Though it seems hard to provide this, we will still take a look at it as this definitely goes in the direction we are aiming: importing / generating DocTypes from OCR.
For reference, we're currently more invested in a text based import using simple regular expressions or text processing libraries: https://appliedmachinelearning.blog/2018/06/30/performing-ocr-by-running-parallel-instances-of-tesseract-4-0-python/
We can keep this open to discuss further if you want to.
pls see comments:
true, but the idea is not to rely on the rectangle position, rather have a template to teach the OCR tool to look for that same string. If it fails on a mandatory one, script should notify that manual attention is needed.
My idea is that you only do this mapping once per supplier. disregaring ocr, most invoices will be PDF, so in that case, same principle would apply, but easier to implement.
true, but most of the time, the thing you're after is the date & invoice-no to allow populating the bare minimum (mandatory fields) and later matching it to a PO
re: tesseract
cool tech, have you tried it on a pile of random invoices?
curious how it works and if there are ways to get parameterized data back from it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
anyone been looking into this lately?
Hello,
I am currently looking for something like this to use with ERPNext. Converting scanned or email-received PDF purchase invoices to text (or even json) and with the needed data automatically creating a purchase invoice in ERPNext. Only with added functionality for uploading the PDF files from the email and attaching them (or link) to the relevant purchase invoice. I'm not a programmer... I'm on the financial side.. There are commercial solutions available for this functionality which means it is possible to create.
anyone been looking into this lately?
Hi @raveslave, unfortunately, we did not find the time to look any further into this.
just checking in, anyone willing to co-sponsor?
I need to extract key value pairs from PDF tables
@raveslave I need to extract key value pairs from PDF tables
Any progress with this on ERPNext
wouldn't it be cool to offer this feature. basically allow to draw an overlay that helps find key, value that can then be mapped to the relevant document type -> field in erpnext!