dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.67k forks source link

JIT: possibly revisit how we set assertion prop table size #10591

Open AndyAyersMS opened 6 years ago

AndyAyersMS commented 6 years ago

Some data on how often we drop assertions (from investigations into dotnet/runtime#10461, via pmi over fx):

Table Size Num Methods Avg Assertion Count Methods with Dropped Assertions
64 159493 4.6 222
128 1801 42.3 28
256 237 81.2 4

Overall this doesn't seem too bad -- only about 0.16% of methods lost assertions from table size limits. I unfortunately didn't track how many of the size 64 cases were ones where we deliberately limited the size for throughput reasons.

Table size is determined by looking at compILCodeSize which will not take additions from inlining into account (among other things). So it seems like we perhaps could find a better size predictor.

Or perhaps just allow the table to grow?

Here's a breakdown of how many methods would fit into our existing size bins:

Assertion Count Methods
0 - 64 161835
65 - 128 585
129 - 256 104
more than 256 21

Max number of assertions seen was 615 (in a method that used a size 64 table), in Microsoft.CodeAnalysis.VisualBasic.VisualBasicCommandLineParser:Parse(ref,ref,ref,ref):ref:this. So 551 of these were dropped.

There is obviously some consideration of the overall cost of the subsequent dataflow analysis too, which is roughly proportional to max assertion size * number of blocks. So if we allow max assertion size to scale up with blocks we potentially see some quadratic behavior. But this seems perhaps solvable too -- we generate all (or more) assertions initially, and then only limit the ones we use if we see bad scaling.

This might let us also try and prioritize which assertions should survive, though I don't have any particular idea yet how to formulate that (something like the value number equivalent of weighted ref count usage ...?).

Note today we actually generate all assertions. Assertion gen doesn't early out once the table is full (though perhaps it should). It just stops recording assertions once the table fills up. So we are already paying the cost of finding all these assertions. And the table size itself is not a burden

So a rough proposal would be to find and record all assertions -- allowing the table to grow larger (with perhaps some size limit) and have a budget based on blocks * assertions, and, if over budget, start trimming assertions somehow to get under budget.

category:implementation theme:assertion-prop skill-level:intermediate cost:medium impact:small

mikedn commented 6 years ago

Should the number of assertions that are actually used be looked at as well? I suppose it's quite possible that we generate more assertions than needed.

FWIW I started a RangeCheck prototype that does not existing assertions but rather extracts them on demand. I need to go back to it to see what it would take to bring it to a reasonable functionality level to allow some measurements.

AndyAyersMS commented 6 years ago

Started to look at how many assertions are used with some ad-hoc instrumentation [raw data]. Here's a summary plot. X axis is the number of assertions created; Y is the number used. Box-and whiskers show distribution of Y for various X's. Red lines give the 0.5, 0,90, and 0.95 linear quantile estimates.

Upshot is that typically about 1 in 10 assertions gets used; 95%ile usage is about 1 in 5. There are some outliers but overall of corelib the maximum number used was 22 (out of 161).

1551 of the 28649 methods used all their assertions, but none of those cases generated more than 5 assertions.

So it would appear most assertions are not useful (at least to assertion prop).

image

erozenfeld commented 6 years ago

@dotnet/jit-contrib

kunalspathak commented 1 year ago

Some related work done in https://github.com/dotnet/runtime/pull/73035 and https://github.com/dotnet/runtime/pull/72700