Add multithreaded parsing

Colecf commented 8 months ago

Android has ninja files that total ~3GB in size. Android's fork of ninja has a multithreaded parser that is able to parse the ninja files in just under a second, but n2 takes over 14 seconds to parse the same files. All these numbers are for AOSP, the internal branch is roughly 50% larger.

I tried to get numbers for how fast the parsing was before 909ac6087a2d74903c1d6c3772c7e3823d4b259a, but n2 fails to parse without that commit:

n2: error: parse error: expected '\n', got ' '
out/soong/build.sdk_phone64_x86_64.ninja:807688: ...ang-tidy.sh ${ccCmd} $

Android's multithreaded implementation can be seen here. (unfortunately it's not indexed on cs.android.com)

In https://github.com/evmar/n2/pull/94 Evan expressed interest in only multithreading individual subninja files, but not chunks of a single file. This would be more work for android, as we'd have to change soong and kati to split up their ninja files, but may be possible, I need to look into it.

In addition, Android mmap's the file instead of reading it into memory, which probably helps read times. I'm not sure if this would become less effective if many smaller files had to be mmap'd / read individually.

Colecf commented 8 months ago

Evan, do you mind elaborating on why you don't want to split the file into chunks? Is it a concern about reduced performance due to searching for valid chunk boundaries and not being able to evaluate global variables as they are parsed? We could probably make it do none of that if the file is a small subninja. (But if it's small, does the performance matter?) Is the concern there the increased complexity?

evmar commented 8 months ago

Thanks for being so thoughtful about this.

I guess first as a general statement of principle I am a fan of (1) making single-threaded code as fast as possible, since that also improves multithreaded performance and (2) when possible, making code fast by making it simpler, especially with architectural simplifications. So it's appealing to me in general if there's an arrangement of files such that they can be safely independently evaluated. (Originally ninja only had 'include' but then I added 'subninja' when I realized it could allow for parallelism, not that we ever benefitted...) Similarly this is why I tried (and failed, as you discovered, whoops) to evaluate EvalStrings aggressively.

Bigger picture, n2 was mostly a toy for me to play around with, so I don't really have any strong justification for ... really any sort of technical judgement. I wrote a kind of introspective post about Ninja and one response that I found pretty insightful was someone pointing out I ought to encourage forks: here, I found it, I am 'evmar' there. I don't mention this in any way to try tell you to go away! But rather that I wonder if you might be able to make better progress if you cut me out of your decisionmaking process, given this is kind of a side hobby thing for me...

PS: re mmap, n2 currently relies on a trailing nul in the input files (see this improvement for loading) so mmap won't work. That is possible to fix (by changing how EOF is handled) but it will be some work.

Colecf commented 8 months ago

Yeah, maybe it would be better if I completed a fork with all of android's requirements, then upstreamed afterwards. I was just somewhat hesitant to do that at first because of the added work rebasing on top of upstream changes. I would like to ensure we're going in the same direction though, especially with regards to multithreading / parse time in general.

Android also uses a trailing nul byte, and does combine that with mmap. Judging based on n2's tracing output, just reading the file into memory takes over twice as long as the android fork's entire ninja invocation, so I think mmap will definitely be necessary.

evmar commented 8 months ago

Oh that is coool! I had seen some claims that mmap could be counterintuitively slower than read so I hadn't thought about this but from the code there it looks like a good idea!

Colecf commented 8 months ago

I have a WIP branch that implements multithreading (without mmap), but it doesn't actually improve the load times at all. It does seem like parsing finishes faster, but there's a lot of time spent evaluating the parsed elements afterwards.

So I may have to look into some more fundamental performance improvements.

evmar / n2

Add multithreaded parsing #108