[MODULE] - OCR image to text

jhoetter commented 1 year ago

Please describe the module you would like to add to bricks Not really NLP, but going in that direction -> if i have an invoice, I want to be able to extract its JSON. This means I want to collect the position of bounding boxes and their texts. Think of having an invoice from Kern AI, you'd have in the top right corner the invoice number. So the JSON could look as follows:

[{
  "position_x1": 50,
  "position_x2": 70,
  "position_y1": 10,
  "position_y2": 15,
  "text": "RE_2023_0001"
}, ...]

Do you already have an implementation? There are tons of libraries offering that. We can look both into open source versions, and into premium offers.

Additional context See #182

LeonardPuettmann commented 1 year ago

This would open up a completely new jar for us. It would be a complicated module as well. Do you get a lot of people requesting this?

jhoetter commented 1 year ago

I've discussed with a couple of data scientists at meetups that they'd like to use refinery, but can't without parsing their PDFs into a JSON format. Overall I agree, and since it is a dev tool people can still build their own pipeline, but I think that an initial effort isn't actually way too large. Something that can be tested within a week or so (I prototyped a brick like above last year in a few hours, it's actually not that complicated with modern libraries :-)). To me, the question is rather how to take image data as input.

All in all, it's something I was asked already quite often, but I agree that it is not directly our focus area for now. Let's see if it is requested more and more :)

divyanshukatiyar commented 1 year ago

I can definitely look into this :)

code-kern-ai / bricks

[MODULE] - OCR image to text #184