Open devtanna opened 3 weeks ago
@devtanna I'd be happy to review, but I'm not sure that kor
style input is helpful here since kor
mostly helps with defining reference examples when the inputs are text.
I would suggest trying langchain
with a prompt that contains an image and using a pydantic model as a tool (if there are multi modal models that also support function calling). What do you think?
Hello 👋
Now that many models support image input as part of the prompt, what do you think of
kor
having support for parsing data from images? I would love to try and put up a draft PR :)The typical use case would be, user inputs a pdf invoice, it's converted to an image, image is input to kor for data extraction. Currently, the pdf is converted to text and then input to kor for data extraction. The image flow is really advantageous when the document has handwritten parts.