explosion / spacy-streamlit

👑 spaCy building blocks and visualizers for Streamlit apps
https://share.streamlit.io/ines/spacy-streamlit-demo/master/app.py
MIT License
794 stars 114 forks source link

Text from a scanned document. #24

Closed ericvanderlinden closed 2 years ago

ericvanderlinden commented 3 years ago

Many research institutes in humanities scan documents and books and get text via OCR. We can analyze the text, but is there a standard way to connect the scan and the text when you know the coefficient of where the word or phrase is on the image.

polm commented 2 years ago

This is a usage question so I'm moving it to discussions.

polm commented 2 years ago

Wait, sorry, I thought this was on the main spaCy repo. I'm not sure why you're asking this on this repo, since it doesn't seem to have anything to do with Streamlit, but there is not a standard way to do that in spaCy right now.

We are working on methods to pass in extra data, such as page coordinates, but we're still figuring out the right way to do this. For now you can see the FAQ.

Closing this since there's no action to be taken. If you want to discuss it more open a Discussion at the main spaCy repo, though you might want to look at the existing threads on OCR first.