BookStackApp / BookStack

A platform to create documentation/wiki content built with PHP & Laravel
https://www.bookstackapp.com/
MIT License
14.67k stars 1.85k forks source link

HMTL export taking longer then 1 minute #5048

Open jonathon2nd opened 2 months ago

jonathon2nd commented 2 months ago

Describe the Bug

Attempting to do an HTML export fails after one minute, results in 504 error.

Steps to Reproduce

Using either export-books.php or via UI image

Attempt to generate an html export of a book.

Expected Behaviour

HTML would be downloaded.

Screenshots or Additional Context

The txt download is ~533kB

Log from console.

2024-06-03T18:51:46.963675560Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
2024-06-03T18:51:47.105897343Z 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  2572  100  2572    0     0  18402      0 --:--:-- --:--:-- --:--:-- 18503
2024-06-03T18:53:00.705073606Z PHP Warning:  file_get_contents(http://bookstack-service.wiki/api/books/28/export/html): Failed to open stream: HTTP request failed! HTTP/1.1 504 Gateway Time-out
2024-06-03T18:53:00.705112372Z  in /export-books.php on line 74

Browser Details

No response

Exact BookStack Version

v24.05.1

jonathon2nd commented 2 months ago

Screenshot from 2024-06-03 13-13-01 PDF also times out

ssddanbrown commented 2 months ago

Hi @jonathon2nd, Exports can take a while if there's a lot of content, and sometimes in rare cases specific content can trip up the exports system and cause more work than expected to be done. Really, this is the kind of thing I'd need to replicate with the same content to actually testing.

Do other books in the system also time-out, even if simple? You could maybe clone the book and delete parts of it to help identify if it's mainly down to a specific page or collection of pages.

M0n7y5 commented 2 months ago

Check your logs ... you may need to change memory limits or execution timeout in php.ini

jonathon2nd commented 2 months ago

@M0n7y5 Both had already increased. I am now running into Cloudflare timeout. No errors in container logs.

@ssddanbrown We have no other books that have the timeout. Once the book is split up, we will export each one and see if it is a problem because of content type, not necessarily the size of the book.

The txt download is ~533kB The md download is ~775kB

jonathon2nd commented 2 months ago

The book has been refactored, still failing to export to html in 1 minute

txt export size: ~150kB md export size: ~250kB

Able to export each page individually image

M0n7y5 commented 2 months ago

You need to tell cloudflare to wait longer for server to respond. Cloudflare thinks server is down while your book is converting to PDF.

M0n7y5 commented 2 months ago

Also one page taking 120MB is crazy ... What kind of content do you have on your pages?

jonathon2nd commented 2 months ago

Lots of photos.

Whats strange is that those couple of huge individual pages take no more then ~3 seconds. Most others were instant. So not sure why the book export explodes.

ssddanbrown commented 2 months ago

Yeah, 120MB is super high. If the pages are exporting quick, might indicate hitting some kind of memory limit or exhaustion, or just that HTML is just too large to be handling without problems. There might be a more efficient way for us to do the embed/parsing (placeholder then simple string replacements at the end) but at those kinds of sizes, I'd be surpised if there are not other issues that pop up anyway. The formats we produce aren't really great for high-image/data content tbh.

M0n7y5 commented 2 months ago

The issue here is that parsing HTML takes a lot of memory and converting it to PDF is CPU intensive task because all of this is done in old PHP library. PHP itself is just slow. I solved my issue by using https://gotenberg.dev/ and overriding the PDF Export. It also solves a lot of weird issues with some Unicode stuff. It uses headless Chrome under the hood.