dbuenzli / vg

Declarative 2D vector graphics for OCaml
http://erratique.ch/software/vg
ISC License
86 stars 12 forks source link

PDF size when using OTF fonts. #36

Open pveber opened 4 months ago

pveber commented 4 months ago

vg includes OTF fonts when it generates PDF files. This leads to rather large files, compared to what can be obtained with e.g. matplotlib, where only glyphs that are actually used in the document are included. I'd like to have a try at this, would you consider such a contribution?

dbuenzli commented 4 months ago

Why not but I'm not sure font subsetting is an entirely trivial task.

So before getting into subsetting, I'd rather have basic compression which would also be beneficial for the vector data of images which sometimes also grows quite big with the current renderer[^1]. Font data also compresses quite well. Did you try to do the transform mentioned here on the files you generate ? Would the results satisfy you ?

If that is the case I think it would be easier and bring more benefits to try to extend the PDF renderer with an optional string -> (string, string) result function (which can be plugged e.g. with Zipc_deflate.deflate or your favourite deflate implementation) optional argument that when present is used to deflate the object streams.

What do you think ?

[^1]: In one of my uses of vg I generate 129MB pdfs which after stream compression via the gs rune are reduced to ~15MB pdfs

pveber commented 4 months ago

Thanks for your feedback!

Did you try to do the transform mentioned here on the files you generate ? Would the results satisfy you ?

Damn, I missed that paragraph. On my use case, the document shrinks from 2.4 MB to 517 KB with cpdf and to 62 KB with gs. This is super nice, and it comes for free!

What do you think ?

For my current need, the case is settled. I think both compression and subsetting would be nice, in particular for the numerous crowd (of which I'm a sorry representative) that does not read the manual until the end. Also it might not be convenient having to perform an external call to get the pdf right. Your proposal seems to be lighter to implement than subsetting and is nicely composable, it looks promising to me. On the other hand subsetting does look important to me: for instance in the DejaVu font, I counted more 3000 glyphs, while in a typical plot I'm unlikely to see more than 30 effectively used. With compression only (see the figures above, using cpdf) the generated files will still remain abnormally large.

Having looked at the code in Vgr_pdf, it's true that subsetting requires quite some changes. I'll have a look at it next week and report on how it went.

Thanks again Daniel!

dbuenzli commented 4 months ago

On the other hand subsetting does look important to me: for instance in the DejaVu font, I counted more 3000 glyphs, while in a typical plot I'm unlikely to see more than 30 effectively used.

I think you should be able to evaluate the gain with the -dSubsetFonts option of gs.

Having looked at the code in Vgr_pdf, it's true that subsetting requires quite some changes.

I'm not sure that's the complicated bit, I expect that you should simply add state to the renderer that collects glyph ids per font.

The complicated bit is rather how to do it correctly in PDF. There is a bit about it in §9.6.4 of ISO 32000-1:2008. The other problem is that you will likely need to re-encode OpenType tables which otfm doesn't support at the moment and make sure all the tables required by PDF are there (see table 126).

pveber commented 4 months ago

I think you should be able to evaluate the gain with the -dSubsetFonts option of gs.

Alas, it seems gs ignores -dSubsetFonts for PDF files (that's what I observe too).

Right maybe it's a bit more than I can chew at the moment. Maybe a useful intermediate step would be to add an encoder in otfm?