Consider improving Tier 2 coverage in PGO data collection

faster-cpython / ideas

1.67k stars 49 forks source link

Consider improving Tier 2 coverage in PGO data collection #639

Open mdboom opened 6 months ago

mdboom commented 6 months ago

When the PGO data is collected (by running a subset of the unit tests using ./python -m test --pgo, the coverage of Tier 2 is very low (since there aren't a lot of long-running loops, I suppose). This may make the compiler treat the Tier 2 loop as cold code and lowering the performance of the Tier 2 interpreter.

A pystats run of ./python -m test --pgo is here, showing only about 10% as many ops executed in the Tier 2 interpreter as in the Tier 1 interpreter.

I'm not sure if this is worth addressing now while we aren't too sure how we expect the Tier 2 interpreter to perform. And, of course, PGO won't affect copy-and-patch JIT'ted code.

gvanrossum commented 6 months ago

I suppose the solution could be as simple as running the tests with -Xuops. How would that change the fraction of ops executed in Tier 2? Or the performance of the benchmark?

mdboom commented 6 months ago

I suppose the solution could be as simple as running the tests with -Xuops. How would that change the fraction of ops executed in Tier 2? Or the performance of the benchmark?

The linked results are from running ./python -m test --pgo and forcing uops (-Xuops isn't enough because it isn't inherited by child processes). So we are already (effectively) doing what you suggest, but it makes Tier 2 look relatively cold. It's certainly different from the benchmarks where Tier 2 operations are 45% of all operations.

gvanrossum commented 6 months ago

I see. I guess tests don't lend themselves that well to the kinds of loops that Tier 2 optimizes.

I think this has always been a problem with PGO -- the code it runs for training isn't that representative. (Then again, who knows whether pyperformance is. :-)

mdboom commented 6 months ago

Yes, I agree on all points.

gvanrossum commented 6 months ago

Let's see if we can do something about that? Maybe add a few self-contained tests from pyperformance to the mix?