Open ajstarks opened 2 months ago
hi Anthony,
do you have a simple reproducer ? (so I don't optimize something irrelevant for your use case)
note for self: parsepngstream
looks into the pHYs
ancillary PNG chunk. image/png.Decode
ignores it.
here's some biblio about it:
Here is a reproducer: It works by making 20 pages using copies of two files "hello-world.png", and "hello-mars.png" (attached) so that each process is a separate file. I prepared them like this:
for i in `seq -f %02g 0 19`
do
cp hello-world.png hello-world-$i.png
cp hello-mars.png hello-mars-$i.png
done
FYI, just processing the same files is a lot faster, (200ms vs. 1.1sec) but that does not match my use case.
package main
import (
"fmt"
"os"
"runtime/pprof"
"github.com/go-pdf/fpdf"
)
func main() {
// PDF page is 1920x1080
pw := 1920.0
ph := 1080.0
// set up the page
pageconfig := fpdf.InitType{
UnitStr: "pt",
SizeStr: "Widescreen",
Size: fpdf.SizeType{Wd: pw, Ht: ph},
FontDirStr: "",
}
pdf := fpdf.NewCustom(&pageconfig)
imgopt := fpdf.ImageOptions{AllowNegativePosition: true}
// use copies of files to exercise processing individual files
f1 := []string{}
f2 := []string{}
for i := 0; i < 20; i++ {
f1 = append(f1, fmt.Sprintf("hello-world-%02d.png", i))
f2 = append(f2, fmt.Sprintf("hello-mars-%02d.png", i))
}
// Profiling...
pf, err := os.Create("default.pgo")
if err != nil {
fmt.Fprintf(os.Stderr, "could not create CPU profile: ", err)
os.Exit(1)
}
defer pf.Close()
if err := pprof.StartCPUProfile(pf); err != nil {
fmt.Fprintf(os.Stderr, "could not start CPU profile: ", err)
os.Exit(1)
}
defer pprof.StopCPUProfile()
// make pages: four images per page, scaled 50%
// substitute discrete files for f1[i], and f2[i] to
// test processing the sames files in the loop
scale := 50.0
iw := pw * (scale / 100)
ih := ph * (scale / 100)
for i := 0; i < len(f1); i++ {
pdf.AddPage()
pdf.ImageOptions(f1[i], 0, 0, iw, ih, false, imgopt, 0, "")
pdf.ImageOptions(f2[i], 960, 0, iw, ih, false, imgopt, 0, "")
pdf.ImageOptions(f2[i], 0, 540, iw, ih, false, imgopt, 0, "")
pdf.ImageOptions(f1[i], 960, 540, iw, ih, false, imgopt, 0, "")
}
// write the PDF
if err := pdf.OutputFileAndClose("pdfimg.pdf"); err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}
here is its profile:
File: pdfimg
Type: cpu
Time: May 24, 2024 at 6:08pm (EDT)
Duration: 1.10s, Total samples = 990ms (89.86%)
Showing nodes accounting for 990ms, 100% of 990ms total
flat flat% sum% cum cum%
240ms 24.24% 24.24% 240ms 24.24% github.com/go-pdf/fpdf.(*wbuffer).u8 (inline)
150ms 15.15% 39.39% 150ms 15.15% hash/adler32.update
120ms 12.12% 51.52% 120ms 12.12% compress/flate.(*deflateFast).matchLen
100ms 10.10% 61.62% 100ms 10.10% runtime.memmove
80ms 8.08% 69.70% 960ms 96.97% github.com/go-pdf/fpdf.(*Fpdf).parsepngstream
80ms 8.08% 77.78% 80ms 8.08% runtime.memclrNoHeapPointers
50ms 5.05% 82.83% 50ms 5.05% compress/flate.(*huffmanEncoder).bitCounts (inline)
40ms 4.04% 86.87% 170ms 17.17% compress/flate.(*deflateFast).encode (inline)
20ms 2.02% 88.89% 20ms 2.02% compress/flate.(*huffmanBitWriter).writeCode
10ms 1.01% 89.90% 10ms 1.01% bytes.(*Reader).ReadByte
10ms 1.01% 90.91% 30ms 3.03% compress/flate.(*dictDecoder).tryWriteCopy (inline)
10ms 1.01% 91.92% 10ms 1.01% compress/flate.(*dictDecoder).writeCopy
10ms 1.01% 92.93% 10ms 1.01% compress/flate.byFreq.Less
10ms 1.01% 93.94% 10ms 1.01% compress/flate.byFreq.Swap
10ms 1.01% 94.95% 10ms 1.01% compress/flate.hash (inline)
10ms 1.01% 95.96% 10ms 1.01% compress/flate.reverseBits (inline)
10ms 1.01% 96.97% 10ms 1.01% crypto/sha1.blockAVX2
10ms 1.01% 97.98% 160ms 16.16% hash/adler32.(*digest).Write (inline)
10ms 1.01% 98.99% 10ms 1.01% runtime.procyield
10ms 1.01% 100% 20ms 2.02% sort.insertionSort
0 0% 100% 10ms 1.01% bytes.(*Buffer).Grow (inline)
0 0% 100% 230ms 23.23% bytes.(*Buffer).ReadFrom
0 0% 100% 30ms 3.03% bytes.(*Buffer).grow
0 0% 100% 30ms 3.03% bytes.growSlice
0 0% 100% 290ms 29.29% compress/flate.(*Writer).Write (inline)
0 0% 100% 30ms 3.03% compress/flate.(*byFreq).sort (inline)
0 0% 100% 280ms 28.28% compress/flate.(*compressor).encSpeed
0 0% 100% 10ms 1.01% compress/flate.(*compressor).fillStore
0 0% 100% 290ms 29.29% compress/flate.(*compressor).write
0 0% 100% 120ms 12.12% compress/flate.(*decompressor).Read
0 0% 100% 10ms 1.01% compress/flate.(*decompressor).huffSym (inline)
0 0% 100% 50ms 5.05% compress/flate.(*decompressor).huffmanBlock
0 0% 100% 90ms 9.09% compress/flate.(*huffmanBitWriter).indexTokens
0 0% 100% 110ms 11.11% compress/flate.(*huffmanBitWriter).writeBlockDynamic
0 0% 100% 20ms 2.02% compress/flate.(*huffmanBitWriter).writeTokens (inline)
0 0% 100% 10ms 1.01% compress/flate.(*huffmanEncoder).assignEncodingAndSize
0 0% 100% 90ms 9.09% compress/flate.(*huffmanEncoder).generate
0 0% 100% 40ms 4.04% compress/flate.NewWriter (inline)
0 0% 100% 40ms 4.04% compress/flate.NewWriterDict
0 0% 100% 400ms 40.40% compress/zlib.(*Writer).Write
0 0% 100% 40ms 4.04% compress/zlib.(*Writer).writeHeader
0 0% 100% 210ms 21.21% compress/zlib.(*reader).Read
0 0% 100% 10ms 1.01% crypto/sha1.(*digest).Write
0 0% 100% 10ms 1.01% crypto/sha1.block
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.(*Fpdf).Close
0 0% 100% 970ms 97.98% github.com/go-pdf/fpdf.(*Fpdf).ImageOptions
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.(*Fpdf).Output
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.(*Fpdf).OutputFileAndClose
0 0% 100% 970ms 97.98% github.com/go-pdf/fpdf.(*Fpdf).RegisterImageOptions
0 0% 100% 970ms 97.98% github.com/go-pdf/fpdf.(*Fpdf).RegisterImageOptionsReader
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.(*Fpdf).enddoc
0 0% 100% 960ms 96.97% github.com/go-pdf/fpdf.(*Fpdf).parsepng
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.(*Fpdf).putpages
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.(*idEncoder).bytes (inline)
0 0% 100% 410ms 41.41% github.com/go-pdf/fpdf.(*xmempool).compress
0 0% 100% 230ms 23.23% github.com/go-pdf/fpdf.(*xmempool).uncompress
0 0% 100% 10ms 1.01% github.com/go-pdf/fpdf.generateImageID
0 0% 100% 980ms 98.99% main.main
0 0% 100% 10ms 1.01% runtime.gcBgMarkWorker
0 0% 100% 10ms 1.01% runtime.gcBgMarkWorker.func2
0 0% 100% 10ms 1.01% runtime.gcDrain
0 0% 100% 10ms 1.01% runtime.gcDrainMarkWorkerDedicated (inline)
0 0% 100% 980ms 98.99% runtime.main
0 0% 100% 10ms 1.01% runtime.makeslice
0 0% 100% 50ms 5.05% runtime.mallocgc
0 0% 100% 10ms 1.01% runtime.markroot
0 0% 100% 10ms 1.01% runtime.markroot.func1
0 0% 100% 10ms 1.01% runtime.memclrNoHeapPointersChunked
0 0% 100% 40ms 4.04% runtime.newobject
0 0% 100% 10ms 1.01% runtime.suspendG
0 0% 100% 10ms 1.01% runtime.systemstack
0 0% 100% 30ms 3.03% sort.Sort
0 0% 100% 10ms 1.01% sort.partition
0 0% 100% 30ms 3.03% sort.pdqsort
Here are the test images:
I note that JPG vs. PNG processing is 5x faster (by converting the pngs to jpg via convert(1)
It looks like the hot spots these two nested loops, where the .u8 method is on the image data before it's compressed.
https://github.com/go-pdf/fpdf/blob/504c6dd8cc916cd7f2097877efd52cae5f1d8b18/png.go#L190-L200
and
https://github.com/go-pdf/fpdf/blob/504c6dd8cc916cd7f2097877efd52cae5f1d8b18/png.go#L210-L223
Based on profiling my application, image processing is the most CPU-intensive, with parsepngstream using a good bit of the time. Is there an opportunity to optimize this function? Perhaps using image/png from the standard library
Here is my profile: