Neriderc / GVExport

Repository for GVExport module for Webtrees
GNU General Public License v2.0
15 stars 6 forks source link

Memory problem #75

Closed hartenthaler closed 2 years ago

hartenthaler commented 2 years ago

If there are about 400 persons to be shown, it can happen that there is not enough memory available resulting in an error message: Error: abort("Cannot enlarge memory arrays. Either (1) compile with -s TOTAL_MEMORY=X with X higher than the current value 16777216, (2) compile with -s ALLOW_MEMORY_GROWTH=1 which allows increasing the size at runtime but prevents some optimizations, (3) set Module.TOTAL_MEMORY to a higher value before the program runs, or (4) if you want malloc to return NULL (0) instead of this abort, compile with -s ABORTING_MALLOC=0 "). Build with -s ASSERTIONS=1 for more info. As far as I have tested it depends not only on the number of persons, it also depends on the setting of the option "Add URL". The URLs are containing many characters and that results in the error.

Maybe it is possible to store the URLs more efficiently? It should be investigated if it is possible to catch this error and handle the error a bit smarter than it is done at the moment with the red error message.

Neriderc commented 2 years ago

URLs are generated by webtrees. I'm not sure what we can really do in that case.

I'm happy to take suggestions on how to solve this.

This could also be an issue for #23 and the other client side outputs. I've been thinking that if we get client-side output working, we should check for GraphViz being installed on the server and use that, but if it's not installed then use client side. Then this would be a matter of gracefully catching the error, showing a message about it being to large to display in the browser, and in the case of client-side output we could suggest installing GraphViz on the server to be able to generate the larger diagrams.

hartenthaler commented 2 years ago

Maybe something like zip-compression can help: all the URLs have many characters in common, only the last characters of each URL are different; so it should be possible to store a placeholder for the common string of all URLs (maybe the character §) and replace this placeholder just before the string is written to a file or is used as a link.

Neriderc commented 2 years ago

Hmm that's a great idea. I believe the memory error is from the viz.js library. If that's the case, we can identify the common part of the URL, remove it before submitting our data to the library, then add it back in just before displaying the SVG in the page.

I'm not sure a placeholder character is needed. If the common part of the URL is the first bit, we could simply remove it, then add it to the start before displaying.

My main concern would be if webtrees changes the URL format in the future, it would break, as we are building based on a specific format. Plus, we need to account for different URL settings. Installation of webtrees on a subdomain, top level domain, or path. Pretty URLs enabled or not. Perhaps other things I haven't thought of.

Neriderc commented 2 years ago

Just thinking again, we could find the root domain and the standard path, do a find and replace with a special character like you suggested, then reverse it (like you suggested). This should fail gracefully, as if the URL structure changed from an update, the find and replace would do nothing and the full URL would remain. The insert would not find the character so it would not insert the base URL.

I'm sure this was your plan but I'm only just catching up now that I've had a chance to think about it. I think this should work.

With this issue we can look at how we catch specific errors and give specific advice. I doubt that shortening the URLs will give a lot of wiggle room. It might be failing at 400 now, with the URLs shortened it might let us have 450, but no matter what we do it will probably always have a number that it fails at. What do you think our advice to the user should be? Something simple like "Too many records to display in browser"? We should probably assign an error code to make it easier to help users troubleshoot as needed. Something like "Error B01: Too many records to display in browser"? Should we include more specific advice, suggesting the user generates the output using the server-side GraphViz or something along those lines?

hartenthaler commented 2 years ago

Searching for the standard path is one option. My idea was:

This should be useful even if the configuration of the webtrees site and the used standard path was changed.

But you are right, this helps only a little bit. But maybe in practice, it could help in many cases. I found the memory problem three times and in all three cases, switching the URL off, helped.

And yes, there is a need to handle memory problems more elegantly. A message like "Error B01: Too many records to display in browser" would be better than the existing error message.

ol810 commented 2 years ago

I solved the memory problem like this:

edit:

.etc/php7.4/php.ini

/etc/php8.1/php.ini

memory_limit = 512M

Works for pdf. In the case of unloading .svg, increasing memory does not give anything.

Neriderc commented 2 years ago

I'm not sure if that's the same memory issue.

@hartenthaler were you seeing the issue when generating the browser output? That's how I reproduced it.

I haven't had issues with the downloaded output no matter how many people are included (I think I've tested with approx 1500).

@ol810 what error message did you get? I'm wondering if we might want to start a "Troubleshooting" page in the wiki where we can post the errors and what people can do to fix them.

ol810 commented 2 years ago

Internal Server Error 500

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error.

More information about this error may be available in the server error log.

Neriderc commented 2 years ago

Great, thanks! An Internal Server Error is pretty generic, and can happen for many different reasons, but I will start a Troubleshooting wiki page and add this as a possible solution to the error.

schuco commented 2 years ago

In my experience with older GVExport versions (for wt 2.0) a memory problem came up rendering serverside a pdf-file WITH photos. PDF without photo for say 100 persons will be about 40KB, with photo it may grow to more than 70MB. This may be a matter of missing thumbnails.

schuco commented 2 years ago

Today I tried the selection anyone first time. Spanning more than 2 max generations of ancestors and descendants brought the error described. However even with the restriction covering 5 "only" generations effectively and displaying any remote link in this span I could follow connections I had never realized in my tree. This is a fantastic tool to discover connections between remote families. Thank you and congratulation for another substantial improvement of GVE. I remember that in the old GVE in times of PhpGedView there was the option to display "all". This never worked for my tree and my son skipped this option for GVE 2.0. As @hartenthaler suggested it should not be labeled as an error but rather as a restriction. There are even features in webtrees like displaying all relations between two individuals which end with an error for too big trees (the vesta version works better).

hartenthaler commented 2 years ago

@schuco To see more impressive pathes in your tree you should try the extended clippings cart and the visualization in TAM and LIN. You can find links to all three custom modules in the German webtrees manual.

Neriderc commented 2 years ago

Thank you and congratulation for another substantial improvement of GVE_.

Thanks! It's good to know when features are enjoyed :)

I remember that in the old GVE in times of PhpGedView there was the option to display "all". This never worked for my tree and my son skipped this option for GVE 2.0.

I saw the remnants in the code, but I actually had to change some parts of how the tree is built in order to get it to display those that I expected. It's possible they had a different idea of who should be included when "all" was selected.

As @hartenthaler suggested it should not be labeled as an error but rather as a restriction. There are even features in webtrees like displaying all relations between two individuals which end with an error for too big trees (the vesta version works better).

You're right, there will always be a limit that causes an error once reached, so in that sense I will update this to an enhancement.

With this issue I intend to do the two things suggested above by @hartenthaler:

Once these two things are done, we'll consider this complete.

schuco commented 2 years ago

@schuco To see more impressive pathes in your tree you should try the extended clippings cart and the visualization in TAM and LIN. You can find links to all three custom modules in the German webtrees manual. @hartenthaler Thank you for that very interesting advice. I have begun to upload the modules and will work with it.

schuco commented 2 years ago

Downloading a big diagram including many photos I run into a memory problem: Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 86268872 bytes) in /var/www/vhosts/schulte-coerne.de/brandneu.schulte-coerne.de/app/Http/Middleware/CompressResponse.php on line 86 There is no problem in the browser display even with photos and downloading works fine for the same tree when I don't include photos.

I don't understand anything about the internal handling of photos and thumbnails in webtrees and GVE, but I suspect that too much space is allocated for the thumbnails required for GVE. This is obvious when the size of a PDF for a diagram without photos is some 100 kB, whereas it may be 10 MB with photos. When you compress the file with photos the size is reduced to a value close to the file without photos.

Neriderc commented 2 years ago

Sorry I've been away, but I've read this and had a think. Also sorry, I accidentally posted a hardly-started comment that you may have got an email notification about. I've deleted it and started again.

Do you have GraphViz installed on the server? For this error message I would guess that you do, but it would be good to confirm that.

You error message states, effectively, that your server allows 128MB of memory, which was exceeded when it tried to allocate about 82MB of memory for this request (the rest is probably used by webtrees itself). I'm trying to think of which part in the process could use this much memory. If GraphViz is installed on the server, then compiling the PDF is done outside of PHP (which I presume to be what has the memory limit) so it likely wouldn't matter (and I think would have a different error if it was the cause). Probably I'd think this request would be the download, but that would mean the file would have to be close to 82MB because the request would not transfer other data except some small amount of meta data to tell the browser about the file and that it should be downloaded. That's possible, you've said the file can be 10MB but it's quite possible with a lot of records this could be 82MB. If this is the case, we may not have much we can do about this. A file is as big as it is, and GraphViz is creating it.

It makes sense that the browser render works fine. This is assembled from a bunch of pieces. The server sends back a "DOT" file, which you can select as the output to see what this looks like. It's basically a bunch of text code. The images are not included at this point. Instead, the DOT file is turned into a diagram using the browser (client) side code, then the browser makes a series of requests to the server, to download one image at a time. So it would not hit the limit because of the size of images (though there are other memory-related reasons you can get errors, hence this original post exists).

When you compress the file with photos the size is reduced to a value close to the file without photos.

This doesn't make sense to me. Image data is already highly compressed. If you take a folder of photos and compress it as a ZIP file, you'll find it has virtually no impact on the size of the folder (and may even grow in size). The only way I'd expect a PDF document with images to be able to be compressed down to the size of one without images is if it used "lossy" compression, but then the images would be poor quality, possibly blurry or pixelated. If you're using Adobe PDF software and using the compression options (like "reduce file size") then this will be lossy compression, so you'll be losing clarity in the images.

But that raises another point. As I understand it, GraphViz would just put the images in as they are. The fact one may have a resolution of 100x100 pixels and another may have a resolution of 1000x1000 does not matter, they will be displayed the same size (to fill the appropriate spot on the tile). But one may be a 100KB file and another might be 3MB. The size of the PDF would probably be at least the sum of the size of each image, With many images it could easily make a large file. It would be nice to scale down the images to a consistent maximum resolution based on the DPI setting, however, I can not find any information on doing this in the GraphViz documentation.

Sorry my reply is a bit of a ramble as I'm just thinking out loud, for practical solutions, I can think of a few options.

  1. You could increase the PHP memory limit. How to do this (or if it's possible) depends on your server. If you're using a host, then you might not be able to change this. If you host your own server, then so long as the server has more memory you should be able to change this. It involves editing a php.ini file - I'd guess this is a question for your son. Earlier in this post someone has had success changing this to 512.

  2. Use a different file format. If the links are important, you could download as SVG then look for a tool to convert SVG to PDF. Inkscape is a very well known (and free) SVG editor. I've tested, it lets you export as PDF and even retains the links so they work in the PDF. However, if you have GraphViz installed on the server then you may get the same error with SVG as the images would be embedded in the same way. So the next option...

  3. Uninstall GraphViz from your server, then export as SVG and do as described in # 2 above, or alternatively export as PDF but it won't have working links. If it's easier, don't uninstall GraphViz, but instead open the GVE file "config.php" and on line 38 remove the first two // at the start of the line. This will basically tell GVE that GraphViz is not installed, even if it is installed.

I'm sorry I don't have any better solutions at the moment. Images are pulled straight from webtrees into GraphViz, so if GraphViz doesn't provide an option to resize then that really limits out options.

schuco commented 2 years ago

Thank you for your exhaustive answer which gives me some better understanding of the problem. I therefore started to compile some information which may lead your further analysis and may be solution of the problem.

Before I forget to answer your question: I do have graphviz on the server.

@Neriderc As I understand it, GraphViz would just put the images in as they are. The fact one may have a resolution of 100x100 pixels and another may have a resolution of 1000x1000 does not matter, they will be displayed the same size (to fill the appropriate spot on the tile). But one may be a 100KB file and another might be 3MB. The size of the PDF would probably be at least the sum of the size of each image, With many images it could easily make a large file. It would be nice to scale down the images to a consistent maximum resolution based on the DPI setting, however, I can not find any information on doing this in the GraphViz documentation.

To confirm your analysis I changed the diagram which resulted in the described memory error by cutting off some individuals. The resulting diagram included more than 100 individuals, 11 of them with photo. The size of the downloaded pdf-file was 57 MB. Making a lossy compression by PDF-Architect 8 still leaving 600dpi graphics gave a file size of 83KB!!! With PDF-Architect I can also extract embedded photos. The photo displayed for I556 in the uncompressed PDF resulted in an extracted PNG with 4MB, from the compressed PDF it was 17KB.

schuco commented 2 years ago

I also looked for the size of the photo for I556 in webtrees-applications and also tried to find the corresponding link to the photo:

  1. When I click the photo on the individual page I556 (Ferdinand Reddemann) the resulting link is: https://brandneu.schulte-coerne.de/index.php?route=%2Ftree%2Ftree1%2Fmedia-download&xref=M63&fact_id=9b8fa699f0dba8d92239f336ab6dde36&disposition=inline&mark=0 The saved graphic has 100KB

  2. Choosing ancestor chart in webtrees and clicking the photo of I556 seems to yield the same link https://brandneu.schulte-coerne.de/index.php?route=%2Ftree%2Ftree1%2Fmedia-download&xref=M63&fact_id=9b8fa699f0dba8d92239f336ab6dde36&disposition=inline&mark=0 but a graphic with just 14KB.

  3. The bowser diagram of GVE gives a different link https://brandneu.schulte-coerne.de/index.php?route=%2Ftree%2Ftree1%2Findividual%2FI556%2FFerdinand-Peveling-gen-Reddemann-I556 and the saved graphic is 22KB.

  4. The photo downloaded from media object M63 has 1.2 MB. This is the same size as for the corresponding file on the server.

Therefore I come to the maybe naive question: Can webtrees provide photos in different sizes for different purposes? I remember in webtrees 1 there were physically to different files for each photo: One was the uncompressed photo, the other, in a different folder, a strongly compressed thumbnail. It was my understanding, that these thumbnails were used for charts like ancestry chart and also for GVE. It does not need a very high resolution for this photos in this type of charts. What happened with thumbnails in webtrees 2.0? It seems that internally something like thumbnails can be used e.g. for the ancestry chart, but these can not be referenced from outside, e.g. by graphviz.

Neriderc commented 2 years ago

Thanks for doing that research. It seems the best solution here would be to use resized images which should lead to drastically smaller file sizes.

The problem is with how GraphViz accesses the photos. It is not accessing the photos through webtrees, instead we request the server file location from webtrees, then GraphViz grabs that file directly off the hard drive. So it is not as simple as asking webtrees to resize it.

To solve this problem would probably require a solution such as:

The client side SVG export should do a similar thing but the process would be very different. Other client side formats are not affected as they do not store the image data directly.

I think this is feasible, I'll have a go at it when I get the time. I unfortunately have a lot less time now than I did a few weeks ago.

When I get a chance I'll also split this issue into multiple, as there are different parts to this.

  1. Compress URLs
  2. Add nicer error message when limit hit
  3. Resize images for server side SVG and PDF
  4. Resize images for client side SVG
Neriderc commented 2 years ago

I'll close this as we now have separate issues for each action from here.