Open igneus opened 2 months ago
I'll have to look, but I'm guessing this is a Poppler-to-QtPdf translation issue like #29.
I think the problem is in PdfPage.text()
-- specifically, not knowing what units to use for QPdfDocument.getSelection()
. ~Searching online just returns its unhelpful documentation and a forum post or two from others who don't know either.~
Edit: QPdfDocument measures page size in points, so I'm presuming those are the units for anything not otherwise specified. Now can someone clarify the units for our rect
before and after self.mapFromPage().rect()
?
"Copy Selected Text" does kind of work if you change the first line of PdfPage.text()
from
rectf = rect.toRectF()
to
rectf = self.mapFromPage(self.pageWidth, self.pageHeight).rect(rect)
like its counterpart in poppler.py. I say it only "kind of" works because the text it copies is rarely the text you've highlighted.
This may be a bug in Qt. Here's a test program:
import sys
from PyQt6.QtCore import QCoreApplication, QRectF, QPointF
from PyQt6.QtPdf import QPdfDocument
a = QCoreApplication([])
doc = QPdfDocument(a)
doc.load(sys.argv[1])
# Get a QTextSelection for all text on the first page
everything = doc.getAllText(0)
# Where on the page is the text found?
rect1 = everything.boundingRectangle()
print("rect1 =", rect1)
# Now attempt to select all text in that area manually
# (if this works, rect2 == rect1)
selection = doc.getSelection(0, rect1.topLeft(), rect1.bottomRight())
rect2 = selection.boundingRectangle()
print("rect2 =", rect2)
# This has no practical value besides checking if QPdfDocument works
# since we can't convert indexes to page coordinates
selection = doc.getSelectionAtIndex(0, everything.startIndex(), everything.endIndex())
rect3 = selection.boundingRectangle()
print("rect3 =", rect3)
The expected result is rect1
and rect2
are equal, and rect3
is at least reasonably close (it may vary slightly because it's looking up by index in QTextSelection
's internal list of strings rather than page coordinates).
Instead, here's what I get testing it on one of my scores:
rect1 = PyQt6.QtCore.QRectF(17.0, 19.0, 567.0, 756.0)
rect2 = PyQt6.QtCore.QRectF()
rect3 = PyQt6.QtCore.QRectF(19.0, 19.0, 565.0, 732.0)
That's a pretty tiny area for rect2
. No wonder we can't find any text in it.
Of course, this potentially being a Qt bug doesn't rule out separate bugs in my own code. :)
I assume that QPdfDocument.getSelection()
uses the same units as the only QPdfDocument
method returning page dimensions - QPdfDocument.pagePointSize()
.
topleft = QPointF(0, 0)
size = qdoc.pagePointSize(0)
bottomright = QPointF(size.width(), size.height())
print(qdoc.getSelection(0, topleft, bottomright).text()) # finds no text
I tried it also in C++, in order to rule out a shortcoming in the SIP wrapper. The result is the same as in Python. getAllText()
finds text, getSelection().text()
doesn't.
#include <iostream>
#include <QPdfDocument>
int main()
{
QPdfDocument qdoc;
qdoc.load("test.pdf");
std::cout << "Whole page:" << std::endl;
std::cout << qdoc.getAllText(0).text().toStdString() << std::endl;
std::cout << std::endl;
QPointF topleft(0, 0);
QSizeF size = qdoc.pagePointSize(0);
QPointF bottomright(size.width(), size.height());
std::cout << "Selection in size of the whole page:" << std::endl;
std::cout << qdoc.getSelection(0, topleft, bottomright).text().toStdString() << std::endl;
qdoc.close();
return 0;
}
With bmjcode/qpageview#3 merged the text selection mostly works, but it has some shortcomings.
QPdfDocument.getSelection()
is implemented.)
In (current qt6) Frescobaldi I select a portion of the musicview by dragging the mouse with right button pressed. Then I right-click the selection to get the context menu. "Copy Selected Text" menu item doesn't appear, although the selection contains text - i.e.
Rubberband.selectedText()
probably doesn't work