WordPress / phpdoc-parser

Documentation parser powering developer.wordpress.org
https://developer.wordpress.org/reference/
239 stars 78 forks source link

Reduce memory requirements #169

Open lkwdwrd opened 8 years ago

lkwdwrd commented 8 years ago

Right now the parser has very large memory requirements. I had to up my local install's available memory to run the parsing for core. 2GB on the VM and 1GB for PHP. This has some headroom, but it's regularly taking around 400MB to parse core initially and 320MB on an subsequent update run. The import script can run hundreds of thousands of queries, a simply put, PHP isn't meant to run for long periods of time so it leaks.

I would love to see some work done to audit where the memory leaks are and shore the worst ones up. I did a little bit of exploring and was able to determine that parsing should never be done with the SAVEQUERIES constant set to true. That should have been obvious, but I have it on for all my dev sites and I didn't think about it saving queries eating up memory during the cli call. Turning this off alone halves the memory requirement (~800MB before turning it off).

Nothing else I have explored has helped much, and some of it even increased the requirements. It'll take some digging because memory tools in PHP are rudimentary at best.

atimmer commented 8 years ago

From looking into this earlier I think it comes down to having a representation of all parsed content in a giant object that is then passed to a function that imports it into WordPress.

I think the only meaningful way to reduce memory footprint is to stop doing that.

lkwdwrd commented 8 years ago

It definitely starts that way. Then usage double or more during the import itself.

On Dec 30, 2015, at 3:22 AM, Anton Timmermans notifications@github.com wrote:

From looking into this earlier I think it comes down to having a representation of all parsed content in a giant object that is then passed to a function that imports it into WordPress.

— Reply to this email directly or view it on GitHub.

JDGrimes commented 8 years ago

What if we parsed and stored the data for each file separately? That way we'd only need to ever have one file in memory at a time. We could then import the files one at a time too. Also, it would perhaps make it easier for us to allow partial parsing in the future, only reparsing the files that have changed since the last time, for example.

So, instead of one giant JSON file, we'd have a bunch of smaller ones in a directory.

JDGrimes commented 8 years ago

Or we could keep one file but stream to/from it instead of pulling the whole thing into memory at once (sort of like is being done with the WordPress Importer plugin).

lkwdwrd commented 8 years ago

Great ideas! This could definitely go hand in hand with making the import process a lot more modular as well. Right now it's kind of an importing god object with a massive god function inside it.

Doing the import in bits would allow us to do some cleanup even in the create workflow as well between each file. We'll likely still need to work out why we're leaking so much memory on the WP side of things, I don't think that's anything the importer is doing, but it's a good start.

JDGrimes commented 8 years ago

https://blackfire.io/ might be helpful here.