Open davidhedley opened 10 years ago
Interesting.
My test show that pdftocairo
works a little better than pdftoppm
(without other parameters), but can still lose thin lines in your PDF.
Actually pdftoppm
has an option -thinlinemode
, values can be none | solid | shape
, and solid
is exactly what your patch is trying to do. shape
seems utilize gray level to represent lines thinner than 1px. However, none
is the default which can lose thin lines. Maybe we want to expose this option to pdf2htmlEX users. @davidhedley, would you like to do this?
Cairo backend has been already used by pdf2htmlEX for SVG background output, so I think it is not hard to do (1). However, pdftocairo
don't have -thinlinemode
or something similar, so pdftoppm
and splash back end still looks superior as far as bitmap output is concerned.
I tried pdftocairo 0.24.5 and it lost some lines on the example PDF, but testing with pdftocairo 0.26.3, they were all present so I assumed this issue had been fixed.
I didn't realise about the thinlinemode in pdftoppm - that seems like it would be the best solution. Does it really need to be an option or should we just set it to "shape" by default?
And in fact the thinlinemode produces a much better result. So in SplashBackgroundRenderer we just change:
SplashBackgroundRenderer::SplashBackgroundRenderer(const string & imgFormat, HTMLRenderer * html_renderer, const Param & param)
: SplashOutputDev(splashModeRGB8, 4, gFalse, (SplashColorPtr)(&white), gTrue, gTrue)
, html_renderer(html_renderer)
, param(param)
, format(imgFormat)
to
SplashBackgroundRenderer::SplashBackgroundRenderer(const string & imgFormat, HTMLRenderer * html_renderer, const Param & param)
: SplashOutputDev(splashModeRGB8, 4, gFalse, (SplashColorPtr)(&white), gTrue, gTrue, splashThinLineShape)
, html_renderer(html_renderer)
, param(param)
, format(imgFormat)
Is any reason why you would not want to enable thinlinemode?
I'm OK to have -thinlinemode shape
by default, but others may prefer solid
. I don't know why does pdftoppm
let none
be the default, maybe it is faster?
My popper is 0.26.1, seems a little outdated.
Actually the default for Splash is not strictly "none". From SplashTypes.h
:
enum SplashThinLineMode {
splashThinLineDefault, // if SA on: draw solid if requested line width, transformed into
// device space, is less than half a pixel and a shaped line else
splashThinLineSolid, // draw line solid at least with 1 pixel
splashThinLineShape // draw line shaped at least with 1 pixel
};
So default behaviour is dependent on the Stroke Adjustment setting in the PDF. However I guess if SA is off, then nothing happens to thin lines and they get dropped which is not good.
I'm doing some testing now, but it would seem that splashThinLineShape produces good results - much more uniform line weights than the "solid" setting.
According to the PDF spec:
10.6.4 Scan Conversion Rules ... A shape shall be scan-converted by painting any pixel whose square region intersects the shape, no matter how small the intersection is. This ensures that no shape ever disappears as a result of unfavourable placement relative to the device pixel grid, as might happen with other possible scan conversion rules. The area covered by painted pixels shall always be at least as large as the area of the original shape. This rule applies both to fill operations and to strokes with nonzero width. Zero-width strokes may be done in an implementation-defined manner that may include fewer pixels than the rule implies. ...
It seems the spec doesn't allow any shape being dropped, no matter whether "Stroke Adjustment" is on. So it is still a bug of splash back end.
If there's a line of width 0.5px, and we zoom the PDF file by 2x when converting it to html, will it be 1px or 2px in the output?
I think the best solution would be to create a new option, and set the default value to shape maybe.
The poppler-splash backend seems to have problems rendering thin lines in PDFs. See http://download.vistair.com/pdf2htmlEX/thinlines.pdf for an example PDF. The output from pdftoppm -png is shown in http://download.vistair.com/pdf2htmlEX/thinlines-splash.png
Interestingly, the poppler-cairo backend does not have this issue (pdftocairo -png output shown in http://download.vistair.com/pdf2htmlEX/thinlines-cairo.png).
Therefore, to fix this issue (apart from reporting it to poppler), there are 2 options:
I have implemented a patch for (2.) which works in this case. In SplahBackgroundRenderer.cc, I have done the following:
This will ensure the (transformed) line width is at least 1 unit wide (in user space). This is obviously a bit of a hack and the fix should really be done at the rasterization stage, but it works well for me and sucessfully renders the example page.