ciur / papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)
https://papermerge.com
Apache License 2.0
2.55k stars 267 forks source link

Option to display OCR'ed text #424

Closed eric-saintetienne closed 2 years ago

eric-saintetienne commented 3 years ago

As of 2.0, the OCR text is displayed on top of the image, as an overlay. Displaying the OCR'ed text in a text area would be useful, at least to check that OCR worked as expected (not many recognition errors) but also to select/copy the OCR'ed text to the clipboard for saving.

For the implementation, it's up to you, I think a textarea could be added under the metadata, on the right hand side of a file, or it could be a separate page containing a textarea and accessed via a new entry in the contextual (right click) menu for a given document.

The alternative is to open the image viewer and do Ctrl+A, but there are problems:

  1. Ctrl+A selects the whole page, including a bunch of other text (whatever is displayed on the page, like the html div and spans)
  2. Once selected with Cltr+A the overlayed OCR'ed text is still hard to read (it's not as readable as a textearea)

Thanks!

ajarzyn commented 3 years ago

Hello @eric-saintetienne,

You already can do this, but I must admit that this option is rather hidden, so maybe good suggestion would be to make it more accessible.

How to view OCRed text of the page:

  1. Open your document
  2. Select this icon in the left top corner: obraz
  3. Select page you would like to see OCR from (this is important without selection option won't work)
  4. Press right mouse button on the page
  5. Select "View OCRed text"

I may make a pull request to add this description to documentation. @ciur would that be a good idea?

ciur commented 2 years ago

Feature is available in 2.1.0x and it is more intuitive to use. Here is a quick demos:

select-first-page-is-not-necessary-anymore