Confirm collected pystats are reliable

mdboom commented 1 year ago

Now that pystats collection is automated, we should confirm that we trust the results. We did notice that the new results are quite a bit different from the manually generated ones from a few months ago. The expectation is that the new ones don't include the pyperformance/pyperf machinery itself, but the differences seem wildly off.

mdboom commented 1 year ago

I think I've found the crux of the problem: when the process is in the "stats off" state (after calling sys._stats_off), stats aren't dumped. When we run a benchmark, we surround the benchmark code with sys._stats_on() and sys._stats_off(), and then there's nothing dumped out at the end of the process. The fact that we are getting any stats at all is a sign of leakage from non-benchmark code (which I'll consider a separate issue that I haven't yet solved).

There's a couple ways to solve this:

Modify pyperf to call sys._stats_dump(); sys._stats_off() after running benchmark code. This will leak the operation of calling sys._stats_dump() itself, but maybe not a huge deal.
Modify CPython to always dump whatever is left, even if stats is turned off. I think this is probably the better, less surprising solution.

@markshannon Thoughts?

markshannon commented 1 year ago

Stats should be dumped, even if stats are off.

Because stats are on at startup, any non-benchmark process will dump stats, so that also needs to be fixed.

Maybe start with stats off by default, and turn them on during startup if an -Xstats option is enabled?

That way the stats should work for pyperformance as it is now, but we still have the option to gather stats for a whole program.

mdboom commented 1 year ago

See https://github.com/python/cpython/pull/100144

sweeneyde commented 1 year ago

It seems the .json files are not comparable to each other because they have keys like "opcode[157].pair_count[145]", which changes meanings as bytecodes change.

Couldn't _Py_GetSpecializationStats be modified to output _PyOpcode_OpName[i] instead of the integer i? I wonder to what degree that would clean up the various scripts. It would also make it so that a different python could run the summary script than the one whose stats are being taken.

mdboom commented 1 year ago

The .json files are just shorter summaries of the raw output, which is designed to be as easy as possible to generate from C. (The C code doesn't have access to the opcode names, or would have to add a table for that or import opcode.py etc...)

I'm not sure there's a lot of value in making this data format well-defined and permanent. The use case for comparing really only makes sense with adjacent revisions anyway. Personally, I see the requirement to use a matching version of the script as a feature -- we can change the semantics of how this works as we understand the problems better without concern for backward compatibility.

All that said, we should probably agree on a policy about this and document it somewhere.

faster-cpython / ideas

Confirm collected pystats are reliable #511