Closed jeffbl closed 2 years ago
Something that we're hearing again and again on the UX side is that people want to know if there's text or labels on images. It's also becoming clear that all the audio work on charts isn't going to be compelling unless we are able to get labels for data. I want to push this issue to the forefront because it's been something that users consistently mention.
@Cybernide Note that specifically for plots and charts, we'll have to discuss the priority of this vs. #129, which would only work for some charts, but wouldn't have the same accuracy issues we're likely to encounter with trying to extract it from the graphics. But of course it also will not be general beyond charts/plots.
@Cybernide Note that specifically for plots and charts, we'll have to discuss the priority of this vs. #129, which would only work for some charts, but wouldn't have the same accuracy issues we're likely to encounter with trying to extract it from the graphics. But of course it also will not be general beyond charts/plots.
Sure - I think that having accurate chart data alone would make a strong case for adopting the extension. However, at this moment, feedback is increasingly suggesting that users would eagerly adopt our extension if it provided the words in images. As to what to prioritize, we'll try to figure it out.
After discussion this morning investigation should include:
After investigation, and depending on findings, generate new separate work items for creating initial preprocessor then building up from there.
Once we have chosen a solution (Azure or something else that looks more promising):
Once we can find text in regions, new functionality it possible, e.g.:
@gp1702 am I forgetting anything from this morning?
@Cybernide @Sabrina-Knappe Anything to add in terms of key issues for investigation as Ben and Aidan go forward on OCR?
Moving out of Dec14, but would like to see an update here on initial investigation/progress on this before the new year?
I'm going to add issues as I think of them here:
having a LOT of detected text, this is the same as having a lot of the same kind of objects in a picture, this means you need to get parameters from the ML side that allow you to prioritize / order / playback louder or softer etc.
Useful docs:
The Azure read API provides bounding boxes which should allow us to identify text contained within objects identified by other preprocessors
@BenMacnaughton @aidanwilliams09 I'm moving this to Jan31, but also changing title to "implement OCR preprocessor", rather than logging a new issue. Acceptable?
Sounds good @jeffbl
As we've seen with the charts preprocessor, text in images can be crucial for understanding, e.g., text on top of / next to wedges in a pie chart. However, the charts model does not currently extract this information, so although the user can know the percentage of different wedges, they have no idea what the wedges represent.
Task is to explore options for extracting text from images, and aligning/linking it with information from other preprocessors that extract locations of objects from graphics, like object detection or the charts preprocessor. Is this even feasible?
Note that for exploration with a haply, just adding the ability to read out text areas when hitting them may be enough in some cases, since proximity to other layers (like the wedges of a pie chart) may be enough to extract relevant meaning.
This capability may also be key for things like diagrams from textbooks, like the solar system graphic from the NFB demos.