PressForward / pressforward

PressForward is a free plugin that provides an editorial workflow for content aggregation and curation within the WordPress dashboard. It is designed for bloggers and editorial teams who wish to collect, discuss, and share content from a variety of sources on the open web.
GNU Affero General Public License v3.0
119 stars 22 forks source link

Handle very large feeds #1040

Open stakats opened 5 years ago

stakats commented 5 years ago

A very large feed will hang retrieval with PHP Fatal error: Allowed memory size of NNNNNNN bytes exhausted (tried to allocate NNNNNNNNN bytes) in /websites/dhnow/www/wp-includes/wp-db.php on line 1978.

See, for example, the feed at http://kbender.blogspot.com/feeds/posts/default which is 9.5MB

AramZS commented 5 years ago

Yikes, I'm not sure why a site would have a 9.5MB feed, but this is a server-level setting. PressForward can't change the allowed memory size of a server's PHP configuration.

AramZS commented 5 years ago

@stakats We can use ini_set('memory_limit','24M'); for example, but the server may not allow a WordPress plugin to escalate that setting. Could you try adding that line to your wp-config.php file and see if it resolves the issue?

lordmatt commented 5 years ago

It should be possible to load a large file in chunks as this plugin does: https://en-gb.wordpress.org/plugins/tuxedo-big-file-uploads/ - in theory, the steps would be (1) detect memory limit reached (2) handle error (3) switch to large file upload. Of course, processing a file of that size is another matter altogether. It might be best just to gracefully acknowledge the error and mark the feed as broken?

I've checked the feed OP mentioned. I have no idea what they are doing but their embed code (they are using on the blog) is something else. (VERY heavy). Never seen anything like it before.

boonebgorges commented 1 year ago

As suggested by @lordmatt, the issue here is not really the fetching of the XML file, but the parsing of that file.

There are some PHP libraries for reading XML files in chunks, and PHP natively has the XMLReader class for streaming an XML file. But these techniques are not compatible with the SimplePie library that WordPress uses to parse feeds. And unfortunately it's not likely to be addressed in SimplePie. See eg https://github.com/simplepie/simplepie/issues/598, https://core.trac.wordpress.org/ticket/45303, https://github.com/simplepie/simplepie/issues/731

As such, the only viable way forward is to move away from SimplePie. I'm having a hard time finding a PHP library that uses XMLReader to stream in a scaleable way (here's a very old proof of concept https://stackoverflow.com/questions/925300/parsing-media-rss-using-xmlreader) so we'd probably have to write our own, or contribute to a project like SimplePie to help them make the improvement. This is going to be a very large task.

So, for the time being, I think it'll have to be a documented issue: If you have problems with your server conking out due to large feeds, you'll have to use WP's tools for increasing memory limit (ie the WP_MEMORY_LIMIT constant in your wp-config.php).