brownplt / pyret-lang

The Pyret language.
Other
1.07k stars 109 forks source link

checkInternalArgs considered harmful #1144

Closed jpolitz closed 6 years ago

jpolitz commented 7 years ago

This program currently runs 1.5-2x as fast on master as it does on horizon:

https://code.pyret.org/editor#share=0B32bNEogmncORjV6QlVhcXlNYzg&v=64c077c

image

Looking at commit histories for build times of the compiler, and doing some commenting out of key lines in runtime, convinces me that the culprit is this commit and the ones around it:

https://github.com/brownplt/pyret-lang/commit/ad3de8147eeef5fa4fde075de7ec3f4e18792d72

They allocate far too many arrays, which causes memory churn, and the call itself just does a bunch of computation.

This is a huge cost to improve these errors. @blerner suggested:

  1. "Transpose" its arguments, so that instead of modName :: String, funName :: string, args :: Array<String>, anns :: Array<Ann>, it takes modName, funName, argAnns :: Array<[array: String, Ann]>,
  2. Flatten those argAnns,
  3. Change the signature to function checkArgsInternal(modName, funName, ...argAnns), and require that argAnns.length % 2 == 0
  4. Hoist the one remaining allocation, of [modName], out of the forEach loop

This might help – @jswrenn want to take a whack? There's probably a bunch of patterns like this surrounding errors that can cause substantial slowdowns and require care to not make performance regressions, so it might be useful for you to try it.

This also reminds me how bad I've been about setting up an automated way to learn about performance regressions, which is a TODO for me.

blerner commented 7 years ago

Another observation: doing this will cause checkArgsInternal to be variadic, and therefore megamorphic, which will hurt the jit. It might be worth splitting it out into checkArgsInternal1, ... checkArgsInternal4 or so, that are each of a unique arity and take (2*their number) arguments accordingly, so that each one can be jit-friendly.

jpolitz commented 7 years ago

@blerner one issue with this proposal is that the way this works internally, it still needs to slice out the actual argument values into an array for the error-reporting infrastructure. It doesn't kill the idea, but it forces refactoring deeper than just to checkArgsInternal.

More evidence of slowness (purple is from the commit before this change):

http://pyret-pitometer.s3-website-us-east-1.amazonaws.com/view/sample.html?branch=horizon&branch=before-internal-args-change&label=string-dict&ymin=1000000&ymax=2000000

http://pyret-pitometer.s3-website-us-east-1.amazonaws.com/view/sample.html?branch=horizon&branch=before-internal-args-change&label=bench-program&ymin=0

Especially note that string-dicts got slower. There's about a second of startup time on these runs, so the difference between 1.8 and 2.1 seconds is more like the difference between 0.8 and 1.1

blerner commented 7 years ago

I don't think we care about that, since that slicing of the even-indexed elements into their own array should only have to happen internally to checkArgsInternal, and only in the error case, right? So the successful fast path still is fast, and you don't have to change call sites to cAI... (Unless you also split it into the arity-specific versions.)

jpolitz commented 7 years ago

Completely agree. If you start chasing down the error case, you quickly find a bunch of helpers that need to be refactored because they expect to get arrays as arguments. Hence more refactoring needed.

blerner commented 6 years ago

is this fixed by 85efa31744bd17856f8765e11adffa893c0f4fbd?

blerner commented 6 years ago

Closing