gmanny / Pechkin

.NET Wrapper for WkHtmlToPdf static DLL. Allows you to utilize full power of the library.
401 stars 128 forks source link

100% CPU usage #12

Open Noodle56 opened 11 years ago

Noodle56 commented 11 years ago

I'm running Pechkin v0.5.8.1, and the first time the following code executes it runs fine, but the second time it jams at 100% CPU usage. This is happening on my local 32bit Windows 7 .net 4.5 machine, as well as our 64bit (IIS in 32bit mode) Win 2008r2 .net 4.5 server. Can you offer any thoughts? Looking at Pechkin's code I think it's in deadlock, or the logging component is going crazy, but I don't know how to check.

Code:

        public static void WritePdfToStream(
            string html,
            System.IO.Stream stream,
            System.Drawing.Printing.PaperKind paperKind
            )
        {
            Pechkin.GlobalConfig config = new Pechkin.GlobalConfig()
            .SetPaperSize(
                kind: paperKind
                )
            .SetDocumentTitle("Document Title Here")
            .SetMargins(new System.Drawing.Printing.Margins(0, 0, 0, 0))
            ;

            Pechkin.IPechkin htmlRenderer = new Pechkin.Synchronized.SynchronizedPechkin(
                config: config
                );
            byte[] pdfBytes = htmlRenderer.Convert(
                html: html.Replace("<body>", "<body><style type=\"text/css\">html, body, p, table, tr, td { padding: 0; margin: 0; } table { font-size: 100%; } .page-break { clear: both; display: block; page-break-before: always; } .a5 { height: 1037px; width: 738px; } .a5rotated { height: 738px; width: 1037px; } .a6 { height: 725px; width: 512px; } .a6rotated { height: 512px; width: 725px; }</style>")
                );
            stream.Write(
                buffer: pdfBytes,
                offset: 0,
                count: pdfBytes.Length
                );
        }
shanefulmer commented 11 years ago

I'm also seeing similar behavior.

hokkaido commented 11 years ago

Same here

Sam7 commented 11 years ago

Have the same problem... but only randomly. On the exact same html. It always works the first time but if you leave it for a few minutes and come back, the same call results in 100% CPU for that thread.

Sam7 commented 11 years ago

Here is the logfile: Working: http://textdump.net/raw/879/ Not Working: http://textdump.net/raw/878/ (again: this is the exact same request, with the exact same html & config) using SynchronizedPechkin

timcroydon commented 11 years ago

And me here. Using SimplePechkin 0.5.8.1 downloaded via NuGet. Get same behaviour in both ASP.NET MVC app and LinqPad. Only just come across the problem so not investigated any further myself yet.

update Thanks to Luke Baughan for pointing out that SynchrionizedPechkin is the version for use where requests may come from multiple threads. Downloading Pechkin.Synchronized nuget package and updating my references fixed my prob (which was not same as OPs, but I'll leave this comment in case it helps anyone).

gmanny commented 11 years ago

I think the excessive CPU usage is a sign of broken synchronization. I. e. when I use SynchronizedPechkin it goes fine, but when I substitute it with SimplePechkin it hangs the second time code runs.

So, I guess it's my buggy SynchronizedPechkin (or you're using SimplePechkin somewhere else in the code, just double check references to be sure).

If you can reproduce this thing on your workstation just do the following:

gmanny commented 11 years ago

Also, do you use any event handlers?

timcroydon commented 11 years ago

I'm not using event handlers. My original problem was indeed because I wasn't using SynchronizedPechkin. However, I've noticed that if I make a change to code in VS2010, recompile and run (using built-in web server) it then hangs unless I bounce the web server so there's something not quite right.

It hangs in the call to .Convert(...) (see here. The only other relevant thread stack is here

gmanny commented 11 years ago

Yeah, I think I need to add checks into the SimplePechkin. That way it'll be garanteed that it's not used from another threads and every user will get understandable explanation about what to do. I'll make changes shortly.

Kons commented 11 years ago

Same issue. This isn't a multiple thread issue. The "synchronized" wrapper is supposed to ensure that no more than one thread access the native libraries at the same time. That's fine. But the problem described happens when the libraries are called at separate times. I.e. PDF rendered -> converter disposed -> pause -> another PDF rendering initiated -> hangs. From my understanding of your design, the following should work provided no more than one thread execute that block of code simultaneously:

    Dim pechkin As Pechkin.SimplePechkin = Nothing

    Try
        Dim confirmationHtml As String = GetConfirmationhtml(confirmationId)

        Dim globalConfig As New Pechkin.GlobalConfig()
        pechkin = New Pechkin.SimplePechkin(globalConfig)
        Dim docConfig As New Pechkin.ObjectConfig()
        docConfig.SetCreateExternalLinks(True)
        docConfig.SetLoadImages(True)
        docConfig.SetCreateInternalLinks(True)
        docConfig.SetPrintBackground(True)

        Dim bytes As Byte() = pechkin.Convert(docConfig, confirmationHtml)

        Response.ContentType = "application/pdf"
        Response.AppendHeader("Content-Disposition", "attachment; filename=Confirmation.pdf")
        Response.OutputStream.Write(bytes, 0, bytes.Length)
        Response.Flush()

        mBypassPageRender = True

    Catch ex As Exception
    Finally
        If pechkin IsNot Nothing Then
            pechkin.Dispose()
        End If
    End Try
Kons commented 11 years ago

I also should add, that the assembly was compiled under .NET 2.0. Yes, I know it's horrible, but I'm constrained to use that version, unfortunately. Sometimes it's easier to climb Everest than convince the management to upgrade (no value to the shareholders).

So this could be the problem, of course, but hard to say. I had to throw out anything to do with the logging component. And changed one line in SimplePechkin, because it wasn't compiling in .NET 2.0:

Oridinal: Marshal.WriteByte(buffer + strbuf.Length, 0); Changed: Marshal.WriteByte(new IntPtr(buffer.ToInt64() + (long)strbuf.Length), 0);

Kons commented 11 years ago

Allright, I think I understand that the problem isn't in simultaneous access, but rather thread affinity.

Correct me if I'm wrong: Once a thread accesses the native libraries, the native code loaded may only be executed again on that same thread only for the duration of the process lifetime, otherwise the process will hang.

Kons commented 11 years ago

By the way, I'm new to Github, so I didn't find where I could leave a message of appreciation. This is the best PDF converter and you took time to write a wrapper for it. Many many thanks for that. You're saving people so much time. This is a very very useful component. Awesome work! Spending your free time to share this with someone. Legend.

Kons commented 11 years ago

Ok, got the thing working under .NET 2.0 by writing a simple wrapper (dedicated background thread and wait handles). Damn legacy VB.NET projects that need to be maintained. Thank god most of my time is spent developing in .NET 4 and C#.

Like I said before, it's a thread-affinity issue, rather than concurrency only.

Once again, many thanks to the author for figuring this whole thing out and making it available to the rest of us. Huge time saving. Very much appreciated.

gmanny commented 11 years ago

@Kons Thank you for the words of appreciation :)

You've had the affinity issue all right, but if you see the implementation of SynchronizedPechkin, you'll find out that it's designed to solve this particular problem: it creates singleton thread and forwards all calls to that thread.

Perhaps, it's not only synchronized but I couldn't find right word for it)

Kons commented 11 years ago

Synchronized is the right word. Your implementation is very good. Great to see it written in C#. Structured and easy to follow. Works perfectly in .NET 4 environment. And it turned out to be easy to adapt for .NET 2 as well. I hate having to write code for .NET 2 these days - always try to convince people to upgrade instead, but what can you do.

Anyhow, this is the first time that I've seen a situation where a component would only work on the thread that first accessed it. So maybe this information will help someone else.

If anyone needs the VB.NET code that I used to handle synchronization in VB.NET, please let me know.

Noodle56 commented 11 years ago

Ah - you guys are quick! Ok, wishing I'd been able to be more helpful on this.

@Kons:- if you can share your code in VB.net, I'm happy to repost it in C# for others; or @gmanny, will you be incorporating this into SynchronizedPechkin.

Also wanted to express my appreciation for this project. It's billion% better than the paid-for PDF generators I've used.

Kons commented 11 years ago

"It's billion% better than the paid-for PDF generators I've used." I second it. We've looked at several commercial ones so far (being a big company it's not a problem) to cover all our PDF conversion needs, but none worked out so well. Amazing.

There's no need to incorporate my solution into SynchronizedPechkin, because the latter handles everything very well already. My code is just for those poor souls, whose management is making them port stuff back to old versions of the framework.

Please see the code below (it was written in under an hour, but with reasonable care in mind). Once again, this is an alternate solution for those who have to use the old .NET:

Protected NotInheritable Class PechkinSync

    Private Shared mThread As System.Threading.Thread

    Private Shared mSync As Object = New Object()

    Private Shared mBgThreadWaitHandle As New System.Threading.ManualResetEvent(False)
    Private Shared mCallingThreadWaitHandle As New System.Threading.ManualResetEvent(False)

    Private Shared mSource As String
    Private Shared mResult As Byte()

    ''' <summary>
    ''' Not supposed to be instantiated.
    ''' </summary>
    ''' <remarks></remarks>
    Private Sub New()
    End Sub

    Shared Sub New()
        mThread = New System.Threading.Thread(AddressOf Run)
        mThread.IsBackground = True
        mThread.Name = "Pechkin_Thread"
        mThread.Start()
    End Sub

    Private Shared Sub Run()
        While True
            mBgThreadWaitHandle.WaitOne()

            mResult = Nothing

            Dim pechkin As Pechkin.SimplePechkin = Nothing

            Try
                Dim globalConfig As New Pechkin.GlobalConfig()
                pechkin = New Pechkin.SimplePechkin(globalConfig)

                Dim docConfig As New Pechkin.ObjectConfig()
                docConfig.SetCreateExternalLinks(True)
                docConfig.SetLoadImages(True)
                docConfig.SetCreateInternalLinks(True)
                docConfig.SetPrintBackground(True)

                mResult = pechkin.Convert(docConfig, mSource)
            Catch ex As Exception
            Finally
                Try
                    If pechkin IsNot Nothing Then
                        pechkin.Dispose()
                    End If
                Catch ex As Exception

                End Try
                mCallingThreadWaitHandle.Set()
            End Try

            mBgThreadWaitHandle.Reset()
        End While
    End Sub

    Public Shared Function Convert(ByVal htmlSource As String) As Byte()

        If String.IsNullOrEmpty(htmlSource) Then
            Throw New ArgumentException("htmlSource - value cannot be null or empty")
        End If

        SyncLock mSync

            mSource = htmlSource

            mCallingThreadWaitHandle.Reset()
            mBgThreadWaitHandle.Set()
            mCallingThreadWaitHandle.WaitOne(20000)

            mSource = Nothing
            Dim result As Byte() = mResult
            mResult = Nothing

            Return result
        End SyncLock
    End Function
End Class
Kons commented 11 years ago

Also, I'm thinking about changing my screen name to Freemanny. Then maybe me and Gmanny can go have a party in the Black Mesa. :)))

Noodle56 commented 11 years ago

@Kons - thanks bud :¬)

Here's a C# version if anyone needs it:

sealed class PechkinSync { private static System.Threading.Thread _pdfThread;

    private static object _syncRoot = new object();
    private static System.Threading.ManualResetEvent _pdfThreadWaitHandle = new System.Threading.ManualResetEvent(false);

    private static System.Threading.ManualResetEvent _callingThreadWaitHandle = new System.Threading.ManualResetEvent(false);
    private static Pechkin.GlobalConfig _config;
    private static string _source;

    private static byte[] _result;

    /// <summary>
    /// Not supposed to be instantiated.
    /// </summary>
    /// <remarks></remarks>
    private PechkinSync()
    {
    }

    static PechkinSync()
    {
        PechkinSync._pdfThread = new System.Threading.Thread(PechkinSync.Run);
        PechkinSync._pdfThread.IsBackground = true;
        PechkinSync._pdfThread.Name = "Pechkin_Thread";
        PechkinSync._pdfThread.Start();
    }

    private static void Run()
    {
        while (true)
        {
            PechkinSync._pdfThreadWaitHandle.WaitOne();

            PechkinSync._result = null;

            Pechkin.SimplePechkin pechkin = null;

            try
            {
                Pechkin.GlobalConfig globalConfig = new Pechkin.GlobalConfig();
                pechkin = new Pechkin.SimplePechkin(globalConfig);

                Pechkin.ObjectConfig docConfig = new Pechkin.ObjectConfig();
                docConfig.SetCreateExternalLinks(true);
                docConfig.SetLoadImages(true);
                docConfig.SetCreateInternalLinks(true);
                docConfig.SetPrintBackground(true);

                PechkinSync._result = pechkin.Convert(docConfig, PechkinSync._source);
            }
            catch (Exception)
            {
            }
            finally
            {
                try
                {
                    if (pechkin != null)
                    {
                        pechkin.Dispose();
                    }

                }
                catch (Exception)
                {
                }
                PechkinSync._callingThreadWaitHandle.Set();
            }

            PechkinSync._pdfThreadWaitHandle.Reset();
        }
    }

    public static byte[] Convert(Pechkin.GlobalConfig config, string htmlSource)
    {
        if (string.IsNullOrEmpty(htmlSource))
        {
            throw new ArgumentException("htmlSource - value cannot be null or empty", "htmlSource");
        }

        lock (PechkinSync._syncRoot)
        {
            try
            {
                PechkinSync._config = config;
                PechkinSync._source = htmlSource;

                PechkinSync._callingThreadWaitHandle.Reset();
                PechkinSync._pdfThreadWaitHandle.Set();
                PechkinSync._callingThreadWaitHandle.WaitOne(20000);

                byte[] result = PechkinSync._result;
                return result;
            }
            finally
            {
                PechkinSync._config = null;
                PechkinSync._source = null;
                PechkinSync._result = null;
            }
        }
    }
}
mattstermiller commented 11 years ago

I have an ASP.NET 2010 web application, and I had the same problem as timcroydon:

However, I've noticed that if I make a change to code in VS2010, recompile and run (using built-in web server) it then hangs unless I bounce the web server so there's something not quite right.

I have tried using both SynchronizedPechkin and the PechkinSync class that Kons provided. It seems to work just fine until I recompile, then a request after a re-compile will hang the development server and max out a CPU core until I kill the server. PechkinSync does have the advantage of timing out and providing an error message, but the runaway thread remains.

I think the whole problem stems from keeping the library initialized. I fixed my issue by simply calling PechkinStatic.DeinitLib() at the end of SimplePechkin's Dispose() method. Then I just use SimplePechkin, using a simple lock to ensure that only one thread is using the library at one time. It might be doing extra work every time, but it never hangs, which is vastly more important.

In case someone wants my simple Sync class, which I inserted into my copy of the library:

namespace Pechkin
{
    public static class PechkinSync
    {
        private static object _syncRoot = new object();

        public static byte[] Convert(GlobalConfig config, string htmlSource)
        {
            lock (_syncRoot)
            {
                using (SimplePechkin pechkin = new SimplePechkin(config))
                {
                    return pechkin.Convert(htmlSource);
                }
            }
        }
    }
}
bUKaneer commented 11 years ago

@Kons & @Noodle56 Thanks for putting your code here - for some reason I was having the 100% issue even with the syncronised pechkin but used Noodles C# port of Kons VB code and seems to have settled down on the server now.

mattstermiller commented 11 years ago

@bUKaneer I'm glad to hear you seem to be having success with PechkinSync, but I was still able to break it. I'm willing to bet it's only a matter of time before you encounter the problem again. See my post above.

bUKaneer commented 11 years ago

@mattstermiller yeah i've seen it lock again today, I'll give your simplified version a go tomorrow ;o) Thanks for the update! We've also implemented a more severe solution by writing a windows exe to call so it executes outside the webserver altogether but tbh I'd rather keep everything in one place - the fewer "oddities" the better imho !

Kons commented 11 years ago

@mattstermiller - you nailed it. The issue is, that when you recompile, the web server process doesn't shut down and the native libraries aren't unloaded, but the original managed static thread reference is discarded. At this point since the library has been initialized by the thread that was discarded in the process of recompilation, it will hang after being accessed again by the new thread assigned to the static variable after recompilation.

I didn't investigate deeply, but this seems like a plausible chain of events.

Good find with PechkinStatic.DeinitLib(). I'll give that a try.

mattstermiller commented 11 years ago

Update - although it doesn't hang, my method no longer renders HTML after the first conversion. All conversions after the first will output a PDF with all of the text content of the HTML, no styles or rendered HTML at all. It doesn't have to do with calling Init and Deinit (I tried calling these several times before the first conversion and it still worked), but there's something different after the first cycle of init, convert, deinit. Unfortunately, I don't have time to investigate this any more right now (higher priorities elsewhere).

gmanny commented 11 years ago

I think @Kons pointed out the root of both this and #5

the web server process doesn't shut down and the native libraries aren't unloaded, but the original managed static thread reference is discarded. At this point since the library has been initialized by the thread that was discarded in the process of recompilation, it will hang after being accessed again by the new thread assigned to the static variable after recompilation.

I'm not familiar with web development using .net, and I don't have time to investigate it, unfortunately :(

mattki commented 11 years ago

First of all thanks Gmanny for the great library.

I tried following in mattstermiller's footsteps to use SimplePechkin in my VS 2012 C# .NET web app project and had the same problem as him where it only renders plain text to the PDF with no formatting the second time around after a single successful conversion. Had to restart VS to get things working again.

I've since found a workaround (a bit nasty) which involves two steps in conjunction:

  1. Running the Pechkin wrapper in a seperate AppDomain then unloading the AppDomain after each run
  2. Unloading the unmanaged DLL wkhtmltox0.dll after each run

I ran this line inside its own AppDomain:

byte[] pdfBuf = simplePechkin.Convert(gc, new Uri(filePath + ".htm"));

I used this code to unload the native DLL after AppDomain disposal:

 foreach (ProcessModule mod in Process.GetCurrentProcess().Modules)
        {

            if (mod.ModuleName == "wkhtmltox0.dll")
            {

                while (FreeLibrary(mod.BaseAddress))
                {

                }

            }
        }

Along with a definition:

    using System.Runtime.InteropServices;

    [DllImport("kernel32", SetLastError = true)]
    static extern bool FreeLibrary(IntPtr hModule);
mattstermiller commented 11 years ago

@mattki Thanks for sharing your find! I was afraid it would take drastic measures like that.

Is it really necessary to run the conversion in its own AppDomain? Could it be possible to unload (and maybe re-load) the DLL? I'm just trying to think of ways to streamline it so that there could be a wrapper class around Convert() that could hide this complexity.

mattki commented 11 years ago

Without the separate AppDomain I get an AccessViolationException thrown on calling the convert method with detail "Attempted to read or write protected memory" - weird since I create a new instance of SimplePechkin and call FreeLibrary on the dll each time around. Maybe there are some handles to unmanaged code sticking around somewhere which don't get removed when unloading the dll or re-pointed when creating a new SimplePechkin.

ianrathbone commented 11 years ago

Hey guys just wanted to add my experience with this issue.

I've been running a WCF service on a 64 bit 2008 R2 server for a while which uses SimplePechkin and noticed some 100% CPU Usage issues. It started to get pretty bad, and we're running quad core instances. So I tried switching over yesterday to the Synchronized version and so far so good, CPU usage is right down.

I'll be keeping an eye on it, but I'd take it if you don't hear from me again on here then it's solved the issue for me!

tuespetre commented 11 years ago

Howdy y'all... I ran across this yesterday and read the whole discussion. What I've done is taken the PechkinStatic class and rigged it up so that it has a private static AppDomain member. Any calls to PechkinBinding webkit methods are wrapped up in that AppDomain. DeInitlib on PechkinStatic now handles unloading the AppDomain and freeing the assembly. Dispose on SimplePechkin calls DeInitlib now. I have run the unit tests many many times and took the same steps to reproduce the problem to verify my results (that the problem is no more.) I added the FreeAssembly method onto the PechkinBindings class, and also gave that class five static members to hold the callbacks since it is in another AppDomain, otherwise garbage collection would sometimes nail the callbacks and cause problems.

I am new to Github so I will try my best to upload the updated solution in whatever way I can. Anyone has questions, just let me know.

Thanks a million bazillion to everyone in this thread for the wonderful contribution, thanks especially to mattki for his discovery and to gmanny for starting this project.

tuespetre commented 11 years ago

Hey everyone, I found out in a difficult way that there is still an issue with my fork that will cause hanging. To prevent the hanging in this instance all I had to do was comment out the part of InitLib where the callbacks are registered to the wkhtmltox0 assembly. I am commenting from the road now but I anticipate working on this issue over the next couple of weeks as I have time.

tuespetre commented 11 years ago

Alright, I have it worked out. I will be uploading the new commit to my own fork shortly, along with updates.

tuespetre commented 11 years ago

Here you go everyone, details and code:

https://github.com/gmanny/Pechkin/pull/42

sbmuzammil commented 10 years ago

here is thread stack when gets hang with SynchrionizedPechkin wkhtmltox0.dll!_ZN11wkhtmltopdf14ImageConverter11qt_metacastEPKc+0x110b8de wkhtmltox0.dll!_ZN11wkhtmltopdf14ImageConverter11qt_metacastEPKc+0x10b2efb wkhtmltox0.dll!_ZN11wkhtmltopdf9Converter16emitCheckboxSvgsERKNS_8settings8LoadPageE+0x7f wkhtmltox0.dll!wkhtmltopdf_convert+0x14 clr.dll+0x2cf7 clr.dll+0x2952 clr.dll!DllUnregisterServerInternal+0x18d93 clr.dll!GetMetaDataInternalInterface+0xe5a8 clr.dll!GetMetaDataInternalInterface+0xe4a7 mscorlib.ni.dll+0x2d371d mscorlib.ni.dll+0x2cf8fa mscorlib.ni.dll+0x30cacf mscorlib.ni.dll+0x3023d7 mscorlib.ni.dll+0x302316 mscorlib.ni.dll+0x3022d1 mscorlib.ni.dll+0x30cb4c clr.dll+0x2952 clr.dll!DllUnregisterServerInternal+0x18d93 clr.dll!DllUnregisterServerInternal+0x195d9 clr.dll!DllGetClassObjectInternal+0x10e29 clr.dll!DllGetClassObjectInternal+0x135f0 clr.dll!DllGetClassObjectInternal+0x1365e clr.dll!DllGetClassObjectInternal+0x1372b clr.dll!GetMetaDataInternalInterfaceFromPublic+0x21db2 clr.dll!GetMetaDataInternalInterfaceFromPublic+0x21e1b clr.dll!GetMetaDataInternalInterfaceFromPublic+0x21d98 clr.dll!DllGetClassObjectInternal+0x1365e clr.dll!DllGetClassObjectInternal+0x1372b clr.dll!DllUnregisterServerInternal+0x22e3 clr.dll!DllGetClassObjectInternal+0x10ce5 clr.dll!DllGetClassObjectInternal+0x13baf ntdll.dll!RtlInitializeExceptionChain+0x63 ntdll.dll!RtlInitializeExceptionChain+0x36

tuespetre commented 10 years ago

Thank you for that, sbmuzammil. Please look at my fork of Pechkin, or the pull request at https://github.com/gmanny/Pechkin/pull/42 for a solution.