Open ephestione opened 6 years ago
We got a similar issue. We use dompdf V. 1.0.2 running on a virtual Debian 10 LAMP server host to generate up to 100 and more multiple-choice questions tests at a time we then use for large exams sessions. Our php script uses the "output method" to render pdf files. I mean: we mix strings grabbed from a db with static html5 and we fill a $output variable that we use afterwards to render pdf files. We use a linked css style, a single very small header jpg logo in our header, no nested tables, pure html5 tags, we do not stream but save and zip our 37 to 40 Kb rendered pdf. Our pdf files looks all ok until we ask our function to render (for some strange never understood and solved reason) up to 136 tests. Starting from 137th, all others will be broken. What gets essentiallly broken is the header part of the HTML, few divs start to get ignored or either messed up. Our script takes up to 180 seconds to process 150 pdf files and we set ini_set('max_execution_time', 0); at the very begin of our php script. I would appreciate someone trying to give us some clue why that issue happens.
Maybe a timeout bug?
So far I've been unable to reproduce this particular issue. If you use a core font (e.g. Helvetica) do you experience the issue? Can you post a sample of your HTML+CSS as well as the relevant PHP? Preferably samples that you can reproduce the issue with.
So far I've been unable to reproduce this particular issue. If you use a core font (e.g. Helvetica) do you experience the issue? Can you post a sample of your HTML+CSS as well as the relevant PHP? Preferably samples that you can reproduce the issue with.
Actually Helvetica is exactely the font we use to render our pdfs. It's pretty hard to post here our php application 'cause it consists in about 1000 code lines we wouldn't like to share at this very moment. I will find out if it's possibile to reproduce same issue in a similar simpler piece of code and be back with another post.
Few more details about how we use the library by the way may help you to reproduce the rendering issue.
What we basically do is grabbing here and there during a single procedural php script execution, a bunch of UTF-8 mb4 mariadb varchar(250) from several correlated tables in db,. We apply some random positioning of these strings into html tags, meanwhile we do a bunch of other staff... we feed an operation progress bar, we create logs, we do other things...
Actually we do not create straight an html file. We keep filling in an $output variable content using $output.= ' '; instead, 'til content of our variable is complete. We then use it as html text to feed the document method and we render the pdf. We then save the rendered pdf and proceed to elaborate next one basically using a nested while loop. Our tests contain 4 sections; up to 8 questions per section, each question has 3 different answers, 25 questions are required in total per single test. Thus we build 4 x A4 portrait pages per test plus another two correction sheets for teachers.
6 pdf pages in total per test per hundreds of times is ultimately what we're asking dompdf to do for us.
Everything works pretty fine until our script renders pdf number 136.
That number changes if we get rid for example of the front left header png logo. In this case it goes somewhere up to 14x.
Css is not involved in causing the issue. It happens without applying any style too.
Problem really seems to be the quantity of pdfs the library is able to render consequently. A buffer full? A timeout?
Consider that our tests are completely different one from the other but issue happens exactely in the same spot every time.
Sounds like we found the conflict that was breaking our pdfs. Our script was the post action of a form using flush obflush to update a progress bar while creating our tests. Setting up the php script as an exec "background worker" and getting rid of the progress bar solved the issue.
@stefanolanci that's ... interesting? I'm not sure why using output buffering in that way so long as you're not modifying the buffer contents while Dompdf is running. I'll have to look at that particular scenario and see if there's any obvious issue.
One way you could enable the progress bar using output buffering would be to kick off the PDF rendering logic in a background worker. Within that separate process you can update a file indicating the progress. Then your original script can read out the contents of the file to determine the progress and flush an update.
These days, though, most progress bars do it a little more asynchronously. You make one call to kick off the PDF rendering process in background worker and return to the browser. Then the browser has some logic (AJAX or full page refresh) to query the status, using a similar messaging mechanism as indicated above.
@stefanolanci that's ... interesting? I'm not sure why using output buffering in that way so long as you're not modifying the buffer contents while Dompdf is running. I'll have to look at that particular scenario and see if there's any obvious issue.
One way you could enable the progress bar using output buffering would be to kick off the PDF rendering logic in a background worker. Within that separate process you can update a file indicating the progress. Then your original script can read out the contents of the file to determine the progress and flush an update.
These days, though, most progress bars do it a little more asynchronously. You make one call to kick off the PDF rendering process in background worker and return to the browser. Then the browser has some logic (AJAX or full page refresh) to query the status, using a similar messaging mechanism as indicated above.
What we did it's pretty similar to what you suggested to do. We now have an exec background worker, and a db table referencing the "processing status" of the bgw. Table has 5 fields:: two bools two ints and a char. First bool is a "permission to execute bit" handled by the action script that calls the bgw script. Kind of an automotive control logic. Second bool just reports if bgw is running, first int updates number of rendered pdf, second int shows number of pdf to be rendered, char gets populated with bgw logs. We got rid of output buffering. We do not need it any longer. We programmed a pop-up window instead , quering data from processing status table that shows up each time users request new tests and background worker it's already doing its job. Everything it's working like a charme.
"Everything it's working like a charme." I spoke too soon... Just for curiosity, today I forced our script to render up to 1000 pdfs. Of course we don't need such a big number of exam papers, I just wanted to check if new method to render pdf using an exec background worker would work consistently. It didn't happen. Same issue mentioned before now happens when pdf number 589 is rendered...
This is how the header of corrector sheet looks like from pdf number 1 to 588: And this is how it looks like starting at pdf number 589: Exam sheet gets garbled even worse: Who can explain why that happens? That's a challenge.
Hello,
yes, I can actually confirm the behaviour mentioned in this issue. PDFs basically "collapse", with all spacing gone and all text overlapping itself. It seems problematic when running multiple jobs in a row. I will perform some testing and provide new detail, if any.
My application usually creates one PDF per time, so I wrote this function:
and put it in my functions.inc.php that is included everywhere else. Now, I have to cycle through several years and a few reports per each year, and create PDF's out of them, so in a while loop I call this function several times. Invariably, after the fourth PDF or so, the results are severely garbled. The reports are basically a long list of rows, not of an html table, but of nested divs, and it's like there is a total amount of rows among all files, after which they get simply drawn one over the other without going down a line. Here's the first file (the fourth one generated) where, after the 196th row, the mess begins:
and every subsequently generated file shows like this: (all the recursive rows created by the code are printed one over the other)
Either I zipstream the files, or save them in a folder as plain PDF, they show the exact same issue, after row 196 of fourth file. If I change the order the files are generated, it will happen somewhere else.
All PDF's are created correctly if I call the function just once inside the php code.
Since I'm calling a function inside of which the instance of DomPDF is created ($filename parameter is null, so I return the output), there shouldn't be resources going wild, should there? I also tried doing
unset($dompdf)
after output declaration, just for kicks, but it obviously changed nothing. There are no images in the files, I don't think clearing cache would help as well. I also increased memory_limit in php.ini from 256M to 512M but the problem stays exacly the same.