Open mdboom opened 6 months ago
I suppose the solution could be as simple as running the tests with -Xuops
. How would that change the fraction of ops executed in Tier 2? Or the performance of the benchmark?
I suppose the solution could be as simple as running the tests with
-Xuops
. How would that change the fraction of ops executed in Tier 2? Or the performance of the benchmark?
The linked results are from running ./python -m test --pgo
and forcing uops (-Xuops
isn't enough because it isn't inherited by child processes). So we are already (effectively) doing what you suggest, but it makes Tier 2 look relatively cold. It's certainly different from the benchmarks where Tier 2 operations are 45% of all operations.
I see. I guess tests don't lend themselves that well to the kinds of loops that Tier 2 optimizes.
I think this has always been a problem with PGO -- the code it runs for training isn't that representative. (Then again, who knows whether pyperformance is. :-)
Yes, I agree on all points.
Let's see if we can do something about that? Maybe add a few self-contained tests from pyperformance to the mix?
When the PGO data is collected (by running a subset of the unit tests using
./python -m test --pgo
, the coverage of Tier 2 is very low (since there aren't a lot of long-running loops, I suppose). This may make the compiler treat the Tier 2 loop as cold code and lowering the performance of the Tier 2 interpreter.A pystats run of
./python -m test --pgo
is here, showing only about 10% as many ops executed in the Tier 2 interpreter as in the Tier 1 interpreter.I'm not sure if this is worth addressing now while we aren't too sure how we expect the Tier 2 interpreter to perform. And, of course, PGO won't affect copy-and-patch JIT'ted code.