HakanL / WkHtmlToPdf-DotNet

C# .NET Core wrapper for wkhtmltopdf library that uses Webkit engine to convert HTML pages to PDF.
GNU Lesser General Public License v3.0
372 stars 67 forks source link

Cpu 100% when generating pdf again after a restart #62

Closed evenlee closed 2 years ago

evenlee commented 2 years ago

I am using the lib for pdf export, and it works good when items are far less than a hundred, when the items are closing to a hundred or even larger and I generate pdf again, the cpu is from 50% to 100% and cannot be done within 10 mins, do we have noticed the issue, is there a workaround?

The IIS worker process will occupy most the CPU.

How to re-produce: 1) restart computer 2) run a report have 50 pages, works good 3) run a report again, the IIS worker process occupies all the cpu resource

evenlee commented 2 years ago

sorry, my fault, I researched my code, seems there is memory leaking.

I changed code from

        var converter = new SynchronizedConverter(new PdfTools());
        var bytes = converter.Convert(doc);
        var provider = new FileExtensionContentTypeProvider();
        var fileName = $"Report_{DateTime.Now.ToString("yyyy-MM-dd_hh-mm-ss")}.pdf";
        var contentType = string.Empty;
        if (!provider.TryGetContentType(fileName, out contentType))
        {
            contentType = "application/octet-stream";
        }
        var stream = new MemoryStream(bytes);
        return new FileStreamResult(stream, contentType)
        {
            FileDownloadName = fileName
        };

to

        using (var converter = new SynchronizedConverter(new PdfTools()))
        {
            var bytes = converter.Convert(doc);
            var provider = new FileExtensionContentTypeProvider();
            var fileName = $"Report_{DateTime.Now.ToString("yyyy-MM-dd_hh-mm-ss")}.pdf";
            var contentType = string.Empty;
            if (!provider.TryGetContentType(fileName, out contentType))
            {
                contentType = "application/octet-stream";
            }
            var stream = new MemoryStream(bytes);
            return new FileStreamResult(stream, contentType)
            {
                FileDownloadName = fileName
            };
        }

the issue has been fixed.

HakanL commented 2 years ago

You should have the converter global in your project, you shouldn't instantiate it for each request.

evenlee commented 2 years ago

Thanks for your advice @HakanL, I changed it to global, and the cpu load is improved a bit, and works more smoothly.

wapenshaw commented 2 years ago

@HakanL @evenlee Even I am having a similar issue as evenlee, how ever this happens on an azure linux webapp and randonly the CPU usage hits 100 and stays there. Restarting the web app is the only issue .

var converter = new SynchronizedConverter(new PdfTools());
                var doc = new HtmlToPdfDocument
                {
                    GlobalSettings =
                    {
                        ColorMode = ColorMode.Color,
                        Orientation = Orientation.Portrait,
                        PaperSize = PaperKind.A4,
                    },
                    Objects =
                    {
                        new ObjectSettings
                        {
                            PagesCount = true,
                            HtmlContent = outputAsString,
                            WebSettings = {DefaultEncoding = "utf-8"},
                            HeaderSettings = {FontSize = 9, Right = "Page [page] of [toPage]", Line = true, Spacing = 2.812}
                        }
                    }
                };

                    var pdfBytes = converter.Convert(doc);
                    var pdfStream = new MemoryStream(pdfBytes);
                    var pdfFileResult = new FileStreamResult(pdfStream, "application/pdf");
                    var pdfName = $"HMRC_MTD_VAT_RETURN_{periodKey}.pdf";
                    pdfFileResult.FileDownloadName = pdfName;

                    return pdfFileResult;

This is my code, I will try using the using keyword to properly use the IDisposable instance and update the situation. Meanwhile can you explain what You should have the converter global in your project meant?

HakanL commented 2 years ago

Having the converter global means you don't instantiate it on each call, but have one shared instance throughout your application.

wapenshaw commented 2 years ago

@HakanL Thank you, in my case thats the only place I use the PDF converter, adding the using statement fixed my issue. So thank you for that too! but I understand having global instance, I have implemented something like this in my project.

https://github.com/JakeSiewJK64/SWE20001/blob/e4bd84f7de659340fc7ee85aa1a0034687613868/src/Infrastructure/DependencyInjection.cs#L69

https://github.com/JakeSiewJK64/SWE20001/blob/e4bd84f7de659340fc7ee85aa1a0034687613868/src/Infrastructure/PDF/PDFConverterHelper.cs#L7

services.AddSingleton(typeof(IConverter), new SynchronizedConverter(new PdfTools()));
services.AddTransient<IPdfGenerator, PdfGenerator>();

and I am using the PdfGenerator across my application now. Do you suggest Injecting the PDF Service as a Transient one or Scoped one?

HakanL commented 2 years ago

It will be a lot slower if you instantiate the converter in each call. So the recommendation is to have it globally like you posted above. It should be a singleton as you should only have one single instance, the native library is not multi-threaded.