🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
MIT License
742
stars
75
forks
source link
collect the important detail from invoice document (pdf) #89
I want to prepare a project to collect the important detail from invoice document pdf (Like, Invoice Number, Date, Total Due, Seller Name etc.) as Key-value pairs.
We prepare the HOCR file from pdf file using OCR engine (Tesseract).
Kindly help us how further proceed with input HOCR file to extract key-value pairs using "catalyst".
Or other approach to prepare Key-value pairs using "catalyst".
Hi all,
I want to prepare a project to collect the important detail from invoice document pdf (Like, Invoice Number, Date, Total Due, Seller Name etc.) as Key-value pairs. We prepare the HOCR file from pdf file using OCR engine (Tesseract). Kindly help us how further proceed with input HOCR file to extract key-value pairs using "catalyst".
Or other approach to prepare Key-value pairs using "catalyst".
Thank in advance.