Open kirillt opened 2 years ago
I would suggest we would go further on this and we could use Tesseract for OCR text recognition of images and PDF English (and possible other languages) documents. In this way, we could have text metadata attached to each PDF and image files and not only plain text files.
The next observations must be taken into account:
TryGetBoundingBox
function for highlighting results in PDF and image files at a detailed search results view.Good thoughts, I've just created separate issue for text layer, since it can also be used for tags suggestions: #183
It should be possible to filter documents by presence of query word in their text content. For this, it's necessary to implement: