ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
364 stars 59 forks source link

Path segment coordinates untransformed? #100

Closed cemerick closed 1 year ago

cemerick commented 1 year ago

The bounding boxes for text objects and characters always appear to be transformed according to the matrices in scope when they are output. However, path segment coordinates seem not to be. e.g. here's an excerpt from the beginning of a PDF content stream

1 0 0 -1 0 850.393677 cm
q
0 0 0 RG /a0 gs
0.75 w
0 J
0 j
[] 0.0 d
4 M q 1 0 0 1 0 0 cm
50 0.395 m 50 252.164 l S Q

The matrix is set with 1 0 0 -1 0 850.393677 cm, and there's a two-segment path, 50 0.395 m 50 252.164 l. PdfPathSegment.point() for these segments (one move, one line-to) yields exactly what is encoded (50.0, 0.395), (50, 252.164), whereas I would have expected these "userland" coordinates to be pre-transformed as the bounding boxes for chars are.

To be clear, the "workaround" isn't hard, and I know pdfium-render is fundamentally just proxying what pdfium's APIs provide. Is that good in this case, though?

I personally can't think of a reason to access untransformed path coordinates, so I assume anyone accessing segment data will always be applying the current matrix anyway. If I'm wrong about that, then perhaps this falls into the same category as #25, where point() maybe should be renamed to raw_point(), with a separate transformed_point() provided that implicitly does the transformation with the current matrix.

ajrcarey commented 1 year ago

Hi @cemerick , thank you for raising the issue.

I think your suggestion of following #25 is the right one, with a function that returns whatever Pdfium returns, and a second function that adjusts the return value in some way.

I like the idea of having a PdfMatrix::apply_to(PdfPoint) function (or perhaps PdfPoint::transform(PdfMatrix), or both). If you have already written something similar and would like to share it, this would be an ideal time.

I'll try to find some time to attend to this later this week.

ajrcarey commented 1 year ago

Well, there is at least one situation why you'd want to access untransformed path segment coordinates: if you're trying to duplicate the object.

We also need to be a little careful in changing the default behaviour of PdfPathSegment::point(), since it's used by font glyphs and clip paths as well as path objects.

I've added a new PdfPagePathObjectSegments::transform() function, so you can apply a transformation matrix to path segment coordinates as you're iterating over the segments in a path object, like so:

for segments in my_page_path_object.segments().transform(my_page_path_object.matrix()?).iter() {
    // segment.point() will now return transformed coordinates
}

Or you can return the transformed point of just a single segment:

my_page_path_object.segments().transform(my_page_path_object.matrix()?).get(segment_index)

Or you can transform the points yourself, independently of the segments collection:

my_page_path_object.matrix()?.apply_to_points(my_page_path_object.segments().get(segment_index))

Whichever approach fits best with your use-case.

Added new matrix math functions to PdfMatrix. Added new PdfRect::transform() and PdfMatrix::apply_to_points() functions for applying transformation matrices directly to rectangles and points. Added new PdfPagePathObjectSegments::raw() and PdfPagePathObjectSegments::transform() functions to allow retrieval of, and iteration over, raw or transformed path segment coordinates, respectively. Used new matrix math functions in PdfMatrix to simplify implementation of PdfRenderConfig. Updated documentation.

ajrcarey commented 1 year ago

Added additional test coverage to confirm transformation matrix applied as expected during iteration over PdfPagePathObjectSegments::transform().iter(). Updated documentation. Ready to release as part of crate version 0.8.10.

As there have been no further comments and I believe the problem is resolved, I am closing the issue. Feel free to re-open if you feel the problem has not been resolved.