Closed archywillhe closed 1 year ago
That's not really something you can do with nougat. The model works end to end and doesn't compute bounding boxes for text. You could try to match the text to blocks extracted by eg mupdf, which has the x,y coordinates. Or if really want to use the model, you could try to make sense of the attention maps, but nothing I recommend, really.
Is there a way to compute rect boxes for the text detected? Or know roughly the starting x,y coordinate of a text paragraph?