SebastiaanKlippert / go-wkhtmltopdf

Golang commandline wrapper for wkhtmltopdf
MIT License
1.06k stars 146 forks source link

Identify Blank PDF generated by go-wkhtmltopdf #100

Closed jagdevsingh9709 closed 10 months ago

jagdevsingh9709 commented 1 year ago

Is there any way to identify that the go-wkhtmltopdf library has created/generated a blank pdf? We are using dynamic html page url to create a PDF. The dynamic html page may or may not have content. If it has content then a PDF will be generated which is working fine but if the html content is empty then a blank pdf is being generated(with our headers and footers which we had set in the page instance). We need to check if the generated PDF has blank/empty body content then create a kafka message and send it to retry topic.

// Create a new input page from an URL
src := "<any html page url>"
// Create new PDF generator
var pdfg *wkhtmltopdf.PDFGenerator
pdfg, err := wkhtmltopdf.NewPDFGenerator()
if err != nil {
    logFunc.Error(err)
    pending.abort(job.ID)
    return nil, err
}
//Margins
pdfg.MarginLeft.Set(0)
pdfg.MarginRight.Set(0)

page := wkhtmltopdf.NewPage(src)

// Add headers and footers
if siteConf.HeaderHTML != "" {
    page.HeaderHTML.Set(siteConf.HeaderHTML)
    page.HeaderSpacing.Set(3.0)
}
if siteConf.FooterHTML != "" {
    page.FooterHTML.Set(siteConf.FooterHTML)
    page.FooterSpacing.Set(3.0)
}

// Add to document
pdfg.AddPage(page)

// Create PDF document in internal buffer
err = pdfg.Create()
if err != nil {
    logFunc.Error(err)
    pending.abort(job.ID)
    return nil, err
}

// Write buffer contents to file on disk
logFunc.Info("Writing PDF... ")
err = pdfg.WriteFile("./test.pdf")
if err != nil {
    log.Fatal(err)
}

out.bytes = pdfg.Bytes()
return out, nil
SebastiaanKlippert commented 1 year ago

Do you actually need to know if the PDF has content or do you just want to know if the source URL has content?

You could check the URL first:

srcResp, err := http.Get("https://example.com")
if err != nil {
    logFunc.Error(err)
    pending.abort(job.ID)
    return nil, err
}
defer srcResp.Body.Close()

// you could check the content length (if available)
if srcResp.ContentLength == 0 {
        err := errors.New("no content") 
    logFunc.Error(err)
    pending.abort(job.ID)
    return nil, err
}

srcBody, err := io.ReadAll(srcResp.Body)
if err != nil {
    logFunc.Error(err)
    pending.abort(job.ID)
    return nil, err
}

// or the body
if len(srcBody) == 0 {
        err := errors.New("no body") 
    logFunc.Error(err)
    pending.abort(job.ID)
    return nil, err
}

log.Println(string(srcBody)) //etc

pdfg, err := NewPDFGenerator()
if err != nil {
    log.Fatal(err)
}

// pass the body as reader
pdfg.AddPage(NewPageReader(bytes.NewReader(srcBody)))

err = pdfg.Create()
if err != nil {
    log.Fatal(err)
}

Or if you actually want to check the PDF it is now supported here, but you could use a PDF parser like https://github.com/dslipak/pdf