pcre.backtrack_limit - Error with large files (and a potential "fix")

chland commented 4 years ago

While messing with the SepaDocumentor-class I ran into this error while generating the controllist-file:

The HTML code size is larger than pcre.backtrack_limit 1000000. You should use WriteHTML() with smaller string lengths.

This happens when you add too many payments to your SEPA-File. I don't know how much of an issue this really is as most people will never run into this problem.

But the funny thing is, that I ran into exactly the same problem with mPDF some time ago and the little workaround i'm using in my project can pretty much be used without any changes in SepaDocumentor.

Here is my modified mPDFWrapper-function. When it encounters a HTML-String that's too big, it splits it into chunks (while trying to make sure chunks end with a </tr> if possible) and adds those chunks one after another to the PDF instead of one giant string.

The biggest issue is that this workaround might mess up the layout if you don't use tables and the str_split-function splits the HTML inside of CSS-definitions/HTML-Tags, etc.

Because of this problem I decided to not post this as a pull-request as it's kind of a "hacky" fix for the problem.

    protected static function mPDFWrapper($html)
    {
        $pdf = new Mpdf();
    $limit = (int)(ini_get('pcre.backtrack_limit') * 0.9);

    if (strlen($html) > $limit) {

        $chunks = str_split($html, $limit);

        foreach ($chunks as $k=>$chunk) {

            $last_tr = strripos($chunk, '</tr>'); // find last </tr>

            if ($last_tr !== false) {

                $last_tr = $last_tr + 5;
                $leftover = substr($chunk, $last_tr);

                if (isset($chunks[$k + 1])) {
                    $chunks[$k] = substr($chunks[$k], 0, $last_tr);
                    $chunks[$k + 1] = $leftover.$chunks[$k + 1];
                }

            }

        }

        foreach ($chunks as $k=>$chunk) {
            $pdf->WriteHTML($chunk);
            unset($chunks[$k]);
        }

    } else {

        $pdf->WriteHTML($html);

    }

        return $pdf->Output('','S');    // returns the PDF as a string
    }

I guess it would be better to use a custom str_split that makes sure to never split inside of html-tags but for my use-case this version works good enough.

AbcAeffchen commented 4 years ago

Thanks. I was not aware that you can have to large HTML string at this point. But what is too large at this point. Since the table formatter of mPDF is very slow, you probably run into a execution timeout anyway.

I'll have a look into this. But since we generate the HTML code it is probably easier to have only a chunk of rows generated and written to mPDF instead of having everything and then splitting it up again.

chland commented 4 years ago

Keep in mind that a single payment can easily add 1KB to the HTML-string (because of the indentation, the style-definition that is repeated for every payment, etc.). I did a quick test and hit the limit when i added somewhere around 2750 payments (which is probably WAY more than most people will ever add)

And of course you can easily optimize the HTML. I just replaced the style-definitions with a class that does the same thing and removed some spaces. This change alone already allowed my test-script to add nearly 5k payments before it crashed.

Another idea could be to allow different file-formats for the routing slip and the control list. So instead of exporting both as PDF, the control-list could be exported as CSV. So you would just pass something like ['routingSlipFormat' => 'pdf', 'controlListformat' => 'csv'] to the store()-function. This would get rid of the resource-heavy PDF generation entirely.

But anyways: thank you very much for your SEPA-libraries!

AbcAeffchen commented 4 years ago

I see. I will have a look at that. 2750 payments sounds not that bad at the moment. I'm actually not sure if the documentation files are really used by anyone. I was asked about that feature some times, but as for all of the SEPA stuff, I'm not aware how widely it is used.

CSV export would also be a nice option. But currently it is not planned. Probably the payment data comes from a database anyway so it is not to important to export it into an other database like file.

You are welcome. Nice to hear that you like them :)

AbcAeffchen / SepaDocumentor

pcre.backtrack_limit - Error with large files (and a potential "fix") #2