Closed mdboom closed 5 months ago
In the past, we've had an issue where sometimes the mypyc
-compiled C version of mypy
was installed instead of the pure-python version (which we want). I believe this was done by using --no-binary
or something similar in the pip install
command.
This almost looks like either:
Not 100% sure this is the issue, but given the history of this benchmark it's the first place my mind went.
Yeah - my mind went there, too, but I think I've ruled that out (we're getting Python versions everywhere, as far as I can tell).
It's a similar problem to what @brandtbucher suggested -- it's the benchmark changing, not CPython, just not the C vs. Python problem. That date is the moment that this change to the benchmark was deployed to our benchmarking infrastructure. In hindsight, we probably should have renamed the benchmark given that it changes the results so dramatically -- (No blame -- I reviewed that PR, IIRC).
I think the thing to do is: 1) Remove this benchmark entirely from our dataset -- this is likely to have the effect of slightly improving the results on recent commits 2) Backfill the bases with the new version of the benchmark 3) Then going forward we should have reliable results for this benchmark
Closing -- the above steps are all complete.
Currently, the mypy2 benchmark is a real outlier at 2.5x slower on main than on 3.12.0. What happened to make it so much worse? Is it merely turning on Tier 2, or something else?
Plotting the data we have for this benchmark over time, it looks like it is not Tier 2 related, but happened at a fixed moment in time (though there is one weird outlier, this may have something to do with git merge history rather than anything else):
The massive slowdown happened sometime between CPython commit 3faf8e5 and 05a370a, which is a range of 76 commits. I'm going to bisect this to see if I can find the culprit.
Cc: @markshannon