Implement layout extraction

OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

https://www.OpenAdapt.AI

MIT License

989 stars 137 forks source link

Implement layout extraction #187

Open abrichr opened 1 year ago

abrichr commented 1 year ago

In order to support https://github.com/MLDSAI/OpenAdapt/issues/157, we want to extract structured information from documents.

See https://huggingface.co/docs/transformers/model_doc/layoutlm for implementation (alternatives?)

We want a LayoutExtractionReplayStrategyMixin that implements:

def get_layout(image: ndarray)
# maybe screenshot, maybe something else

FFFiend commented 1 year ago

~~for the image param would it be called using the image property in Screenshots? i.e if we have a Screenshot object then, get_layout(screenshot.image()) ?~~

followup question: why do we wrap the image as an np array? Is it to leverage numpy's speed when moving the image around?

Edit: nvm I just looked at PIL and Image documentation :D