SebastiaanKlippert / go-wkhtmltopdf

Golang commandline wrapper for wkhtmltopdf
MIT License
1.06k stars 146 forks source link

Multiple pages from memory #54

Closed yiyue115 closed 4 years ago

yiyue115 commented 4 years ago

Hi, I want to create a multi-page document from [][]byte

My current code is like this, where pages is of [][]byte type

for _, p := range pages {
    if len(p) == 0 {
        return nil, errors.New("blank page encountered")
    }
    page := wkhtmltopdf.NewPageReader(strings.NewReader(string(p)))
    pdfg.AddPage(page)
}

However, it seems that it's only producing one page of content, with the rest of pages being blank. I'm guessing it's something to do with this part inside func (pdfg *PDFGenerator) run() error, where the content of page is only being read once.

for _, page := range pdfg.pages {
    if page.Reader() != nil {
        cmd.Stdin = page.Reader()
        break
    }
}

Do you have any suggestion as how I should modify my code or there might be potential issue in the current code? Thanks a lot and I'm looking forward to your reply!

SebastiaanKlippert commented 4 years ago

Hello,

Unfortunately it is not possible to add multiple input pages from stdin. This is more a wkhtmltopdf/os limitation than a bug here. The break is intentional because it is not possible to have mutiple std inputs per call. Wkhtmltopdf either accepts a list of filenames and/or webpages or a hyphen - to read from stdin. But you can have only one stdin source. This is also documented on PageReader and in the readme.

But it seems you might be confusing input and output here. It is possible to create multi-page documents from a single input document. The input is called page (in wkhtmltopdf) because it refers to a webpage, not a PDF page. Document (as in HTML document) might be a better name.

The HTML is rendered as normal and then printed as PDF document, so there is no theoretical limit to the number of pages in the output, if your HTML is long engouh it will be rendered to multiple pages. If you want to force page brakes you can insert html page breaks and use print-media types if needed.

The issue is probably in your HTML code and not here, so I cannot say why your pages are blank, but [][]byte wil not work as input. So I see a couple of options:

A third option, a probably the most simple option, is just appending the raw html files in a []byte, but it will depend on your HTML if that renders correctly so no guarantees here.

The following sample seems to render correctly (make sure the HTML documents begin with <!doctype html> and have a body). Using a buffer for readability:

pdfg, err := NewPDFGenerator()
    if err != nil {
        log.Fatal(err)
    }

    htmlfile1, err := ioutil.ReadFile("D:\\header.html")
    if err != nil {
        log.Fatal(err)
    }

    htmlfile2, err := ioutil.ReadFile("D:\\body.html")
    if err != nil {
        log.Fatal(err)
    }

    htmlfile3, err := ioutil.ReadFile("D:\\footer.html")
    if err != nil {
        log.Fatal(err)
    }

    buf := new(bytes.Buffer)
    buf.Write(htmlfile1)
    buf.Write(htmlfile2)
    buf.Write(htmlfile3)

    pdfg.AddPage(NewPageReader(buf))

    err = pdfg.Create()
    if err != nil {
        log.Fatal(err)
    }

    err = pdfg.WriteFile("./testfiles/merged.pdf")
    if err != nil {
        log.Fatal(err)
    }

This will render everything on one output page, to force page breaks you could do something like this:

    buf := new(bytes.Buffer)
    buf.Write(htmlfile1)
    buf.WriteString(`<P style="page-break-before: always">`)
    buf.Write(htmlfile2)
    buf.WriteString(`<P style="page-break-before: always">`)
    buf.Write(htmlfile3)

    pdfg.AddPage(NewPageReader(buf))
yiyue115 commented 4 years ago

Cool! Thanks so much :)