PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

[CTS 14] Handling SVG? #89

Closed PhilterPaper closed 3 months ago

PhilterPaper commented 6 years ago

PDF::Builder currently handles several image formats (GIF, JPEG, PNG, TIFF, etc.), any of which can be dynamically produced (on the fly) for a web page. Another increasingly popular format is SVG (Scalable Vector Graphics), which can be easily produced on the fly, and displayed by a browser. Would it be worthwhile to handle SVG graphics in the same manner as other graphics? The basic SVG commands are fairly simple to parse into PDF drawing commands, but SVG can also embed images within its codes, adding a layer of complication. The alternative could be to use an external utility to convert SVG content into an already-supported image format, but this would require that the user install some external program(s). Another choice could be to add hooks to permit invoking a user-supplied command line utility to convert SVG (or an arbitrary image format) into a supported image format.

Something like an existing file with SVG or image content (and SVG possibly embedded yet another image file) could either be read in or processed externally, while in-line SVG command text might be directly parsed by image_svg(). This could also pull in image files. Anyway, there are many combinations and possibilities for SVG support, that could be direct support (convert to PDF primitives) plus embedded images, or that SVG is externally converted to a supported image format. A program wishing to output PDF might also dynamically generate SVG for document graphics, in lieu of using PDF primitives directly, such as generating content suitable for browser (HTML) or PDF output.

PhilterPaper commented 1 year ago

Oops, good catch. Both @opts in the POD should be %opts.

I might add %opts to bogen(), if I can keep compatibility with the optional parameter list currently in bogen(). I'm thinking about it. I also want to keep API compatibility with API2's bogen as much as possible (even though it's apparently a bit broken, and Builder has an additional optional parameter). Who knows how long it will take for API2's bogen to get fixed, or bogen_ellip be added.

From the programmer's perspective, bogen() is now more of a convenience function for bogen_ellip() (if anyone actually uses it!). I may go ahead and consolidate the code to have bogen() just be an alternate entry (with one radius) for bogen_ellip(). I want to keep bogen() around, just in case someone is using it.

If someone pulls in PDF::Builder::Bogen into a PDF::Builder application via PDF::SVG, is that going to risk any namespace collision with Content.pm's bogen() and bogen_ellip()? I appreciate your giving credit to me (and thanks for the pointer to the GNOME code, which got me much further along).

sciurius commented 1 year ago

Don't bother adding %opts to bogen, it was just a question.

PDF::Builder::Bogen is just a package (and namespace) that contains bogen, bogen_ellip and the helpers _arc2points and _arctocurve. As long as you don't start using the PDF::Builder::Bogen namespace there won't be clashes.

But there is a real risk of namespace clashes. Currently I have the following packages/namespaces:

PDF::SVG
SVG::Parser
SVG::CSS
SVG::Element
SVG::Circle (Rect,Ellipse etc)

There are several other modules on CPAN that clash. SVG::Parser and SVG::Element to name a few (probably more, didn't check all). Now I could (should?) bring them all under the PDF::SVG hierarchy but this leads to long names and Perl does (still) not provide a means to shorten packages, e.g.

use PDF::SVG::Element as Element;
...
$e = Element->new;

I haven't made up my mind how to solve this.

PhilterPaper commented 1 year ago

I was thinking anyway of adding %opts to bogen to modernize its interface and make it more consistent with bogen_ellip. I think I can do it, but won't know for sure until I actually try it. I want to keep the old (optional parms) interface available, in case anyone is using it.

Regarding namespace collisions, that would preclude me from ever splitting out bogen, bogen_ellip, and possibly related routines into a PDF::Builder::Bogen. I'm a little uneasy about a "foreign" package burrowing into my package's namespace (PDF::Builder::*), which kind of places a burden on me not to use Bogen.pm. Is it really that much extra typing for an end user to type PDF::SVG::Element->new()? I see that two or three of your list are already in use on CPAN (at just the top level) -- it's conceivable that someone might use PDF::SVG and use SVG::Parser (for example) for other work, and suffer a collision.

sciurius commented 1 year ago

Your point is clear. I'll move the bogen code to some internal module so it won't get in the way of any future PDF::Builder::Bogen.

Is it really that much extra typing for an end user to type PDF::SVG::Element->new()?

That is only part of the problem (in an other package I have App::Music::ChordPro::Output::PDF::StringDiagram->new() -- you get ugly long lines). The other part is that in the case of PDF::SVG, SVG::Parser and SVG::CSS are unrelated to the PDF so it seems a bit weird to call them PDF::SVG::Parser...

But I'm afraid there's no other option.

sciurius commented 1 year ago

Ok, the package is renamed SVGPDF. Main package is SVGPDF.pm, and lots of subpackages. The bogen code is in package SVGPDF::Contrib::Bogen.

I've added some 40 SVG sample files as regression tests, that all pass for PDF::Builder and PDF::API2. There is one test that fails with perl 5.30 and before, possibly due to rounding errors in the math library. I have to investigate.

PhilterPaper commented 1 year ago

Sounding good. I look forward to the SVGPDF package on CPAN. No great rush -- I have a bunch of stuff to get into PDF::Builder 3.026 Real Soon Now (I hope!). Then, the decks will be clear to add lots of SVGPDF-based stuff. I appreciate your efforts.

I support Perl 5.26 and up for PDF::Builder, so it would be nice to have that last example working properly. If it's really an edge or corner case, consider leaving it out for the initial release. Does it happen only in specific Perl builds (particularly with extended math libraries)? I have seen that kind of thing before, where t-tests failed due to different levels of available precision (more is not always better!). I ended up rounding some math results to "standard" double-precision (or even to single-precision) so they would pass the tests regardless of the Perl build.

sciurius commented 1 year ago

The lowest Perl version I support is 5.26 so no sweat.

The problem has been located:

printf "%.2f\n", -1.785;

This prints "-1.78" with 5.30 and up, and "-1.79" with 5.28 and lower. The difference plays no serious role, it is just that the debugging statements use %.2f format, and therefore the output fails to match the expected result.

PhilterPaper commented 1 year ago

Odd. I would say that's a fairly serious bug, unless pre-5.30 it wasn't rounding to specification and they fixed it in 5.30. When dealing with negative numbers, you've got truncation, round away from zero, and round towards zero. Not to mention what to do when the first digit to drop is a 5 (round to odd or round to even). For someone who wants consistent results across a range of Perl versions, that's nasty to change specs like that (if they did).

sciurius commented 1 year ago

You seem to be good at math... My math is extremely rusty.

Do you know how to convert an SVG transformation matrix to a PDF matrix, given that in the PDF the y coordinate must be flipped. I.e., the SVG coordinate system runs from top left 0,0 to bottom right 100,100, the corresponding PDF coordinates are translated 0,100 so top left becomes 0,0 and bottom right is 100,-100.

PhilterPaper commented 1 year ago

If your math is extremely rusty, mine has long since corroded away! Anyway, this might be better discussed by telling what sort of transformations are being done in SVG, that you need to transfer to PDF. If it's just a matter of flipping the coordinate system over (so 0,0 is at the top left instead of bottom left, an y grows downward instead of upward), I think you need to do two things: Y-scale of -1 (X-scale is +1), and Y-translate up by some amount less than media size (X-translate is 0). Presumably you're not doing skews or rotations (leave them 0). I'm not sure if you'll need to reverse (negate) a skew or rotation angle. Shouldn't your resulting Y at the bottom right be 100,0 rather than 100,-100 (so it will be visible on the page)? Or perhaps with some positive offset to get it to the top of the page?

SVG                                                        PDF
0,0                                        100,0           0,100                                   100,100
   10,10                 50,10                              10,90                  50,90        
     +---------------------+                                   +---------------------+
     |                     |             90,50                 |                     |             90,50
     |                50,50+---------------+                   |                50,50+---------------+
     |                                     |                   |                                     |
     +-------------------------------------+                   +-------------------------------------+
   10,90                                 90,90               10,10                                 10,90
0,100                                    100,100           0,0                                      100,0

I think the PDF matrix for this would be [ 1 0 0 -1 0 200 ], although it's possible the figure may have the wide part at the top. If it does end up flipped like that, I'm not sure you can use a transformation matrix. I think you're just going to have to do some trial-and-error experimenting to get the desired results. If you're working with a general purpose SVG transformation matrix that you want to map to a PDF transformation matrix, that's a whole 'nuther animal.

I recall a few years ago dealing with some sort of PDF editors that left the coordinate system flipped over (and some oddball page height), resulting in anything additional written in PDF::Builder being upside down or something like that. The solution was to first add low level PDF commands to set the transform matrix to reverse and offset Y -- I think it's described in Content's POD under Advanced Methods.

sciurius commented 1 year ago

The initial transform depends on the viewBox. If the viewbox is 0,0,width,height then the transform is translate(0,height). 0,0 will be topleft and all subsequent y coordinates must be negated.

The basics are simple:

SVG (x,y) → PDF (x,-y) SVG translate dx,dy → PDF translate dx,-dy SVG scale sx,sy → PDF scale sx,sy SVG rotate x → PDF rotate -x SVG skew x,y → PDF skew -y,-x

but the challenge is to process an arbitrary SVG transformation matrix.

SVG matrix a,b,c,d,e,f → SVG matrix ????

Flipping the Y coordinate by using scale(1,-1) does not really work, since that also affects texts which will become upside down. However if there's no alternative I think I will have to use scale(1,-1) and apply another scale(1,-1) to the texts only. Doable but a lof of work. Changing selected pluses into minuses and vice versa all over the place is very error-prone.

BTW:

On Thu, Aug 17, 2023 at 10:45 PM tux wrote:
As mentioned on Amsterdam IRC by Johan (summarized):

 sciurius: 'printf "%.2f\n", -1.785;' prints "-1.78" for >= 5.30 and "-1.79" for <= 5.28

I see the same on 64-bit builds of perl on Windows.
According to my checks against the mpfr library, the result, as given by 5.30 is correct, and the result as given by 5.28 is incorrect.

I'm finding that on perl-5.28, the value -1.785 is being assigned incorrectly:

>perl -wle "printf '%a', -1.785;"
-0x1.c8f5c28f5c29p+0

"%.2f" formatting correctly outputs that incorrect value (-1.7850000000000001)  as -1.79.

whereas perl-5.30 assigns the value correctly:

>perl -wle "printf '%a', -1.785;"
-0x1.c8f5c28f5c28fp+0

"%.2f" formatting correctly outputs that correct value (-1.7849999999999999) as -1.78.

I expect that  "%a" formatting will reveal the same anomaly on the system Johan was using.

Cheers,
Rob
sciurius commented 1 year ago

Update: Changing the pluses and minuses turned out to be less of a problem than I anticipated... And the matrix transform is now functional. Thanks a lot for the hint!

PhilterPaper commented 1 year ago

And the matrix transform is now functional.

Yea! BTW, the SVG and PDF specs both detail in what order transformations should be applied, at least if specified as separate calls. I'm not sure what the rules are if a transformation matrix is used. Anyway, be careful to check that you're not assuming a particular order of operations.

Regarding the rounding issue, if Perl has specified all along the intended rounding behavior, and it wasn't implemented correctly until 5.30, then that's a bug in pre-5.30 implementations. Still, it's annoying that such important behavior is changed/fixed right in the middle of the 5.x release stream. It's a bug in 5.30+ if it met specs (if given) before and now doesn't, and the spec should not have been changed mid-stream. Note that Perl may or may not have met IEEE-754 floating point specs before (or now), and might have had its own f.p. spec (a bad idea... IEEE-754 has been around forever)*. Further note that there are many extended f.p. libraries around which add extra digits of precision at the cost of incompatibility with other extended libraries, and might even affect "standard" precision work. That's what I ran into several years ago -- some Perls are built with one or another extended f.p. library, which will produce varying results, although I have never seen rounding issues such as you describe.

* Almost everyone has used IEEE-754 in hardware for quite some time, except very old architectures (e.g., IBM s/360 family).

PhilterPaper commented 1 year ago

JV, I see in "samples.pdf" PDF output from the equation SVG's that appears to include vertical extents and a baseline (vertical). In other words, the descender and ascender values. I just want to confirm that I'll have this information available to me when your package returns a PDF xo. I will need it to vertically align the returned image with the text baseline when doing inline equations, and want to make sure this vital information will still be present. Thanks!

sciurius commented 1 year ago

The result depends on what the SVG provides. For the MathJax formulae:

     viewBox="0 -1749.5 43414.9 2999"

The baseline is at y=0, so there a ascender of 1749.5 and a descender of 1251.5. You can see it in this picture, generated with a debugging option.

x.pdf

PhilterPaper commented 1 year ago

Inline equations via MathJax are the only place I can think of that would need vertical alignment on a baseline.

sciurius commented 1 year ago

Example of how to inline a MathJax SVG. mj.zip

PhilterPaper commented 1 year ago

Looks great!

"Display" equations (centered, optional tag at right margin) will simply be treated as SVG images. Let me know how you make out with the recursive <svg> tags etc... if I don't specify an optional tag, it omits the nested <svg> (I can handle a tag separately). It may still be doing some funny stuff to center the equation on the page (or column)... you mentioned finding odd viewbox settings. If you reach an impasse, I'll have to see what I can do about automatically modifying the produced SVG code. Or, you could do it in your library, as long as it's carefully defined what modifications you're doing. I can pass to you whether the input SVG file/string is used as a plain image, an inline equation, or a display equation; if that would be useful to you.

sciurius commented 1 year ago

Progress...

In this PDF I have manually tweaked the positioning, to make sure that the formula is correctly rendered and that nesting of SVGs works correctly. The remaining problem is to deal with the combination of page size, vertical-align, min-width, width, height and the viewBox. As a consequence, these SVGs must be rendered with given dimensions so the result looks as designed (responsive design).

For example (outer SVG): The viewBox is 21707.5 -1749.5 1 2999. Min-width = 109.889ex and height=6.785ex. The specified width of the viewBox (1) then becomes min-width / height * 2999 → 48619.6. This would look similar to

[ xxxxxxxxxxxxxxxxxxxxxxxxx  nn ]

where the xx are the formula, and nn the label. When rendered at 1280 width this will look like:

[         xxxxxxxxxxxxxxxxxxxxxxxxx          nn ]

x.pdf

PhilterPaper commented 1 year ago

You may be getting too elaborate with this. I presume that you noticed that your example has too narrow a page width, and the tag overlaps the equation. A PDF page is not dynamically resized, as an HTML page can be, so I think it should be the responsibility of whatever program is assembling the page to check the available column width against the returned SVG image's width, and decide what to do about it. Most likely, like any other image, if it's too wide it will simply be centered and the left and right edges will extend beyond the column bounds and possibly off-page, requiring the author to shrink or split up the equation, or even span two or more columns on the page. All that's required of you is to tell me the height and width (in points) of the returned XO, as well as any baseline position information (particularly for an inline equation).

I can strip off the tag and handle it separately, if need be, and I can tell you whether or not this is an inline equation (if necessary for dealing with baseline/ascender/descender heights). Of course, if you enjoy the technical challenge of doing this, go right ahead!

sciurius commented 1 year ago

Well, I enjoyed the technical challenge for a while but now I'm giving up. Attached how close I got... x.pdf

The interaction of nested viewBox, width, height, bbox and preserveAspectRatio is, for me, now, unfathomable. Also, I made a wrong design decision that gets in the way in precisely this case (but not for 'normal' SVGs). I currently do not feel like redoing that.

So I'm advancing to the next step: using the renderer in production.

PhilterPaper commented 1 year ago

If nested <svg>'s and funny viewBoxes are causing trouble in display equations, I think that all I have to do is create the SVG without a tag and it will be clean.

Here's a new equation to try: mjIL inline format mjDT display format, with tag, nested svg's and wonky viewBox mjDNT display format, no tag

sciurius commented 1 year ago

No problems with IL and DNT. x.pdf

PhilterPaper commented 1 year ago

No problems with IL and DNT.

OK, unless you tell me otherwise in the next few months, I will plan to strip the \tag{} off of equation input and handle it separately. A "display" equation (as SVG image) will be horizontally centered on the page. It will not dynamically reposition, since PDF pages are of fixed width. I think that should be adequate.

By the way, I understand that some SVG's can include raster graphic images. Do you have any plans for those? I don't think it's all that important, at least for an initial release, but eventually someone will request it.

sciurius commented 1 year ago

I already support PNG and JPG as links

<image href="image.png" x="0" y="0" height="50px" width="50px"/>

and inline data

<image href="data:image/png;base64,iVBOR...ErkJggg==" x="0" y="0" height="50px" width="50px"/>
PhilterPaper commented 12 months ago

At this point, assuming Johan's SVG-to-PDF library comes to fruition, I will drop any further efforts on my SVG::Reader library. I don't need SVG for any purpose other than PDF output, so there will be no need for an SVG-to-generic-graphics converter, nor support in PDF::Builder for this "generic" format. I may keep what I've done so far around for a while, hoping that someone else will pick it up if they need such a converter. If not, at some future point I may just delete it.

sciurius commented 11 months ago

What's brewing...? As of the next release, Text::Layout will support inline images. E.g. the marked up text string

Abc<img src="alert.svg" />def

results in

scrot20231009113646

Attributes can control appearance, e.g. dy (vertical displacement) and w (advance width).

Abc<img src=alert.svg dy=20 w=0/>def

scrot20231009114721

PhilterPaper commented 9 months ago

Hi Johan,

How is SVGPDF coming along? I periodically check your GitHub respositories, and haven't seen anything. Are you still working on it? I just released PDF::Builder 3.026 and will be looking soon to add SVG support for 3.027. Please let me know if you have decided not to proceed with an SVG-to-PDF library, in which case I will have to resume work on my own :-( . The work you showed me looked very promising, doing everything I need (so far) to do for PDF::Builder, and I'm hoping to see it as a released Perl library!

By the way, I know you do a lot of work with musical notation. I thought you might find this video of interest: https://www.youtube.com/watch?v=Eq3bUFgEcb4 .

sciurius commented 9 months ago

Hi Phil,

The library is alive and kicking and in full use in ChordPro with very good results.

I kept the repo private to avoid early forks and potential problems with changing the API. I hope the API is now stable, and I have made the repository perl-SVGPDF public so you can access it.

No doubt there will be some wrinkles to iron out...

PhilterPaper commented 9 months ago

Thanks for making SVGPDF visible to me. I copied it over OK, but when I try running it, it prereqs File::LoadLines. This package does not successfully install on Windows 10 (Strawberry Perl 5.32 or 5.38). The errors in t/13-nochomp.t appear to be the same ones as in the CPAN Testers Matrix for Windows. CPAN refused to let me open an RT ticket against File::LoadLines (403 Forbidden, when I tried to submit a new ticket).

sciurius commented 9 months ago

Sorry for the inconvenience... I use File::LoadLines extensively, also on windows, and it seems to work fine. I'll check the tests.

You can safely install with cpan -f -i File::LoadLines.

Issues → https://github.com/sciurius/perl-File-LoadLines/issues

sciurius commented 9 months ago

FYI -- I've uploaded File::LoadLines 1.041 to CPAN. This should build find on Windows.

Thanks for the 'Notation must die' link. I've enjoyed it.

PhilterPaper commented 4 months ago

Another music-related video you might be interested in: https://www.youtube.com/watch?v=Qct6LKbneKQ . It is about the development process for MuseScore4, including notation/engraving, by the same guy who did the Notation video. MuseScore is not only printable notation (engraving) but also interactive editing and MIDI/VAST playback interface.

PhilterPaper commented 3 months ago

I am pleased to announce that SVG support has been added to PDF::Builder. It is in GitHub now, and will soon be released on CPAN as 3.027. I'm sure that there will be updates to this massive code change, but the basic functionality appears to be working now.

Many thanks to Johan Vromans (@sciurius) for his work in implementing the SVGPDF package to support SVG processing!