Low-overhead statistical sampling mode

Currently collecting types has a significant performance impact, and even rewriting type collection in C/Cython would only reduce it so much. It would be nice if the performance impact would be controllable in a way that the lower end would be, say, 1-2%. This would make it practical to collect types on production servers.

The motivation is that types collected during tests or manually running a program are unlikely to be complete, and during tests it's possible to have mocks and fakes that generate noise. By running in production on a large number of servers, it may be possible to easily collect a fairly complete picture of concrete runtime types, at least for more commonly used functions.

A potential approach is to run the type collector for roughly every N call events. At least the sampling logic would have to be implemented in C (or maybe Cython?) for acceptable performance. If N is large enough, the overhead would be dominated by the cost of invoking the profiling hook + a few machine instructions to decrement a counter and check the value.

It should be easy to validate the performance impact of the approach. The cost of collecting types isn't very important since we can make N large. However, at some point the collected types will be too sparse to be useful.

This issue doesn't cover how we'd aggregate types collected in multiple processes.

(The proposed approach is not my invention.)

dropbox / pyannotate

Low-overhead statistical sampling mode #6