Problem with printing weasyprint generated PDF

edkirin commented 6 years ago

I need to print PDF file generated with weasyprint using CUPS PDF printer on linux machine. I know it's redundant, but that's the requirement.

The problem is that printing pdf file generated with weasyprint results in crippled pdf file.

Here are the files. generated_by_weasyprint.pdf printed_with_cups.pdf

I'm printing file with: $ lp -d PDF generated_by_weasyprint.pdf

eden@sunce:/tmp> pdfinfo generated_by_weasyprint.pdf 
Title:          {{ title }}
Keywords:       
Author:         
Creator:        cairo 1.14.6 (http://cairographics.org)
Producer:       WeasyPrint 0.42 (http://weasyprint.org/)
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      595 x 841 pts
Page rot:       0
File size:      97265 bytes
Optimized:      no
PDF version:    1.3

The same problem appears when printing the same file from python using cups lib. I guess weasyprint generates invalid pdf file somehow. I didn't had this problem with weasyprint v0.40 and earlier.

Environment:

Linux Mint 18.2
python 3.5.2
weasyprint 0.42
cups-pdf 2.6.1

edkirin commented 6 years ago

Here are source files for html to reproduce the problem. Please note that file:///tmp/xx/src/ in html should be replaced by file path where those resource files are unpacked.

src.zip

liZe commented 6 years ago

I thought that it was a duplicate of #550, but convert doesn't find any problem in your generated file. There must be something wrong with the files generated by pdfrw.

It's really frustrating to get these errors only with some implementations. At least ImageMagick (and thus GhostScript), Evince (and thus Poppler) and Google Chrome can read this PDF file perfectly. Is the problem in Cairo, WeasyPrint, pdfrw, or Cups? :cry:

jonlesser commented 6 years ago

I believe I am experiencing a similar issue. I am generating a PDF which I send to a label printer with the lp command. Some PDFs generated with WeasyPrint v41+ consistently print with missing parts.

I use Jinja2 to fill an HTML template string which I then pass to WeasyPrint to generate a PDF. When I inspect the HTML I always see the content as I expect. When I view the generated PDFs in a Chrome tab or in Preview, they always appear to have all of the content I expect.

There are not very many elements to my template: ID number, timestamp, item description, and a barcode that encodes the ID number. I can tweak these values in ways that will result in a fully printed label, but I don't understand why certain values result in fully printed labels and some result in partial labels. When it's partial, it consistently just prints the logo (svg) and a horizontal rule.

For example, a label with ID number 90809 results in a fully printed ticket, but simply changing the ID number to 92101 results in a partially printed label. I can likewise get some partial tickets by just tweaking the timestamp or the item description.

I initially observed this with 0.42.1. After reading this issue, I tried downgrading to previous versions. Versions 0.41 also has this problem. Version 0.40 and 0.39 do not have this problem. I cycled through version with a simple "sudo pip install WeasyPrint==0.XX".

The attached zip file contains a PDF generated with 0.40 and one generated with 0.41 along with a photo of the resulting labels. I also attached the output of pip freeze and dpkg -l.

v41regression.zip pip_freeze.txt dpkg_versions.txt

Python snippet where I'm generating the PDF:

  # Fill HTML template.
  data = json.loads(message.data)
  template_str = template.render(data=data)

  # Convert HTML to a PDF, save it to disk, and send it to the printer.
  html = weasyprint.HTML(string=template_str, url_fetcher=barcode_fetcher)
  with tempfile.NamedTemporaryFile(mode='w') as tmp_file:
    tmp_file.file.write(html.write_pdf())
    tmp_file.file.flush()
    # If we can't print, we won't ack the message.
    try:
      output = subprocess.check_output(
          ['/usr/bin/lp', '-o', 'outputorder=reverse', tmp_file.name],
          stderr=subprocess.STDOUT)
      logging.info('lp output: %s', output)
    except subprocess.CalledProcessError as e:
      logging.error('lp complained. `%s` returned code %d. Output: %s',
                    e.cmd, e.returncode, e.output)`

liZe commented 6 years ago

I wonder if #596 is not a duplicate of this bug.

@edkirin @jonlesser Which version of Cairo do you use? If you have 1.14.x, could you please try 1.15.x?

jonlesser commented 6 years ago

I currently have the libcairo2 1.14.6-1 package installed on my Ubuntu 16.04 system. That's the latest for 16.04. There is a 1.15.10-2 package for Ubuntu 18.04, but upgrading is not an option for me right now.

liZe commented 6 years ago

I have no problem reading these files with various PDF viewers or converters. As the text is missing, I'm pretty sure now that this issue and #596, #550 and #523 are the same. They all appear with PDF files generated by Cairo 1.14 with a version of WeasyPrint based on pdfrw.

Based on how pdfrw works and on the different results I get using Python 2 and Python 3, I think that the problem is caused by the way pdfrw "shuffles" data. PDF files generated by Cairo 1.14 and modified by pdfrw are sometimes "wrong". PDF generation changed a lot in 1.15.4 and is now (hopefully) fixed.

It's hard to know where the "real" bug is. The v41 PDF provided by @jonlesser works with a lot of various implementations and is OK according to some validators I've found online. If anyone knows a PDF guru, we'd be happy to know what's wrong in these documents, or at least what's different between your documents with different IDs.

liZe commented 6 years ago

@jonlesser I've found that a bug has been fixed in Ghostscript 9.21, and Ubuntu 16.04 provides only 9.18. Do you have the possibility to update the Ghostscript?

liZe commented 6 years ago

@edkirin What's your version of Ghostscript?

edkirin commented 6 years ago

@liZe Ghostscript 9.18, the default one which comes with Linux Mint 18.2.

liZe commented 6 years ago

Ghostscript 9.18, the default one which comes with Linux Mint 18.2.

Then I'd be really interested to know if you get the same problem wit Ghostscript 9.21+ (even on another computer if you can't upgrade Ghostscript on this one).

Kozea / WeasyPrint

Problem with printing weasyprint generated PDF #565