benoitkugler / go-weasyprint

(WIP) Golang port to weasyprint python html to pdf library
https://pkg.go.dev/github.com/benoitkugler/go-weasyprint
BSD 3-Clause "New" or "Revised" License
23 stars 1 forks source link

Fix font handling #5

Open PylotLight opened 1 month ago

PylotLight commented 1 month ago

I never used to have to deal with fonts in python, so not sure why I'm being forced to define all this stuff I don't want to deal with here.

fs, err := fc.LoadFontsetFile(fontmapCache) fontconfig := text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))

All I want is a simple html to pdf here from string to file. err := pdf.HtmlToPdf(os.Stdout, utils.InputString(html), fs)

benoitkugler commented 1 month ago

Your snippet is almost correct, just pass fontconfig instead of fs in HtmlToPdf

The python implementation uses C dependencies to handle fonts. This module uses a pure Go implementation which uses an on-disk cache to store font information. We have chosen to expose the path to the font cache.

I'm working towards enabling go-text as a replacement for the text engine, so that the FontConfiguration creation will slightly change in the future. The reference to fcfonts.NewFontMap and fc.Standard will not be needed anymore.

PylotLight commented 1 month ago

I don't have the fontmapCache file present so I can't get this example to work currently. Copying the test file exactly gives:

    var fontconfig text.FontConfiguration
    const fontmapCache = "pdf/test/cache.fc"
    fs, _ := fc.LoadFontsetFile(fontmapCache)
    fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
    err := goweasyprint.HtmlToPdf(os.Stdout, utils.InputString(html), fontconfig)

panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xff864a]

goroutine 1 [running]: github.com/benoitkugler/textprocessing/pango.(*GlyphString).fallbackShape(0xc0008157c0, {0xc00013a0b0, 0x2c, 0x2c}, 0xc000694ff0) /home/User/go/pkg/mod/github.com/benoitkugler/textprocessing@v0.0.3/pango/glyphs.go:213 +0x14a

benoitkugler commented 1 month ago

See the file pdf/draw_test.go and the snippet :

// this command has to run once
fmt.Println("Scanning fonts...")
_, err := fc.ScanAndCache(fontmapCache)
if err != nil {
    log.Fatal(err)
}

fs, err := fc.LoadFontsetFile(fontmapCache)
if err != nil {
log.Fatal(err)
}
fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
PylotLight commented 1 month ago

2024/07/09 09:44:17 invalid font dir /usr/share/texmf/fonts/opentype/public stat /usr/share/texmf/fonts/opentype/public: no such file or directory

Ye this font stuff is just really not working for me. Perhaps I just wait for the replacement of these parts ;p I got go-wkhtmltopdf working for now, but I'll come back to this one if I can ever get it working.

benoitkugler commented 1 month ago

The error message is just a warning, it shouldn't fatal. What is the error returned by fc.ScanAndCache ?

PylotLight commented 1 month ago

Full example

func main() {
    html := ""

    var fontconfig text.FontConfiguration
    const fontmapCache = "pdf/test/cache.fc"
    fmt.Println("Scanning fonts...")
    _, err := fc.ScanAndCache(fontmapCache)
    if err != nil {
        log.Fatal(err)
    }

    fs, err := fc.LoadFontsetFile(fontmapCache)
    if err != nil {
        log.Fatal(err)
    }
    fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
    err = goweasyprint.HtmlToPdf(os.Stdout, utils.InputString(html), fontconfig)
}
Scanning fonts...
2024/07/09 21:59:38 invalid font dir /usr/share/texmf/fonts/opentype/public stat 
/usr/share/texmf/fonts/opentype/public: no such file or directory
2024/07/09 21:59:39 open pdf/test/cache.fc: no such file or directory
exit status 1
benoitkugler commented 1 month ago

Thank you for the full example.There is something strange though : only one fatal log should happen (since log.Fatal exit the program). Could you be even more specific and print all the errors ? (That is add fmt.Println(err)) )

PylotLight commented 1 month ago

That was with empty html string, this link has a sample page in it. https://pastecode.dev/s/twxqkbfe

Scanning fonts...
2024/07/10 00:42:37 invalid font dir /usr/share/texmf/fonts/opentype/public stat /usr/share/texmf/fonts/opentype/public: no such file or directory
open pdf/test/cache.fc: no such file or directory
loading font set: open pdf/test/cache.fc: no such file or directory
webrender.progress: 2024/07/10 00:42:40 Step 1 - Fetching and parsing HTML
webrender.progress: 2024/07/10 00:42:40 Step 3 - Applying CSS - 1 sheet(s)
webrender.progress: 2024/07/10 00:42:40 Step 4 - Creating formatting structure
webrender.progress: 2024/07/10 00:42:40 Step 5 - Creating layout - Page 1
webrender.progress: 2024/07/10 00:42:40 Step 6 - Drawing pages
webrender.progress: 2024/07/10 00:42:40 Step 7 - Adding PDF metadata
%PDF-1.7
%����
4 0 obj
<</DecodeParms [ null ] /Filter [/FlateDecode] /Length 69 >>
stream
���� C��W��sfX��5���oʲ~SV=sT��׹��dY{ɲ�%4!��4M�|����
endstream
endobj
3 0 obj
<<
/Type/Page
/Parent 2 0 R
/MediaBox [0 0 595.27563 841.88983]
/BleedBox [0 0 595.27563 841.88983]
/TrimBox [0 0 595.27563 841.88983]
/Contents [4 0 R]
>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids [3 0 R]>>
endobj
1 0 obj
<<
/Type/Catalog
/Pages 2 0 R
>>
endobj
5 0 obj
<<
/Producer (Go-WebRender 0.59)
>>
endobj
xref
0 6
0000000000 65535 f 
0000000401 00000 n 
0000000349 00000 n 
0000000178 00000 n 
0000000015 00000 n 
0000000449 00000 n 
trailer
<<
/Size 6
/Root 1 0 R
/Info 5 0 R
>>
startxref
500
benoitkugler commented 1 month ago

Could you add the exact Go sample you use ? It still don't get why the program does not exit at the first log.Fatal.

PylotLight commented 1 month ago

Could you add the exact Go sample you use ? It still don't get why the program does not exit at the first log.Fatal.

package main

import (
    "fmt"
    "os"

    goweasyprint "github.com/benoitkugler/go-weasyprint"
    fc "github.com/benoitkugler/textprocessing/fontconfig"
    "github.com/benoitkugler/textprocessing/pango/fcfonts"
    "github.com/benoitkugler/webrender/text"
    "github.com/benoitkugler/webrender/utils"
)

func main() {
    html := `<html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>My Website</title>
  </head>
  <body>
    <main>
        <h1>Welcome to My Website</h1>  
    </main>
  </body>
</html>
`

    var fontconfig text.FontConfiguration
    const fontmapCache = "pdf/test/cache.fc"
    fmt.Println("Scanning fonts...")
    _, err := fc.ScanAndCache(fontmapCache)
    if err != nil {
        fmt.Println(err.Error())
    }

    fs, err := fc.LoadFontsetFile(fontmapCache)
    if err != nil {
        fmt.Println(err.Error())
    }
    fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
    err = goweasyprint.HtmlToPdf(os.Stdout, utils.InputString(html), fontconfig)
    if err != nil {
        fmt.Println(err.Error())
    }
}
benoitkugler commented 1 month ago

The issue is here :

_, err := fc.ScanAndCache(fontmapCache)
if err != nil {
    fmt.Println(err.Error())
}

I think you don't have the proper directories to match the font cache file defined as const fontmapCache = "pdf/test/cache.fc"

Could you adjust this constant to something like <a directory I own/cache.fc> or maybe simply cache.fc ? Thank you.

PylotLight commented 1 month ago

but i dont have that file, and there would be no reason to given it was never explained in any doc anywhere? nvm it might be working, ima test it at work in the morning.

PylotLight commented 1 month ago

Alrighty it wrote my file, but didn't process the inline css inside the string like wkhtml does.

    // weasyprint
    var fontconfig text.FontConfiguration
    const fontmapCache = "cache.fc"
    fmt.Println("Scanning fonts...")
    _, err = fc.ScanAndCache(fontmapCache)
    if err != nil {
        return err
    }

    fs, err := fc.LoadFontsetFile(fontmapCache)
    if err != nil {
        return err
    }
    fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
    file, err := os.Create(filename)
    if err != nil {
        return err
    }
    err = goweasyprint.HtmlToPdf(file, utils.InputString(buf.String()), fontconfig)
    if err != nil {
        return err
    }
    // wkhtml
    pdfg, err := wkhtmltopdf.NewPDFGenerator()
    if err != nil {
        log.Fatal(err)
    }
    pdfg.AddPage(wkhtmltopdf.NewPageReader(strings.NewReader(buf.String())))
    err = pdfg.Create()
    if err != nil {
        log.Fatal(err)
    }
    err = pdfg.WriteFile(filename)
    if err != nil {
        log.Fatal(err)
    }
benoitkugler commented 1 month ago

Can you post the exact html string you use ? I didn't grasp which CSS you are refering to.

PylotLight commented 1 month ago

Can you post the exact html string you use ? I didn't grasp which CSS you are refering to.

https://paste.ofcode.org/iCum4BQTjKeWcQkhexMVJp

benoitkugler commented 1 month ago

Thank you. What is the CSS not processed by GoWeasyprint ?

PylotLight commented 1 month ago

PDF result: AU_PRD_RITM17270697.pdf

Wkhtml from same string generates correct coloring on each cell, and content. it just doesn't load properly for some reason.