Find a base sample size (recognizing that this will be naive and affected by JIT / cold everything)
Take 10 samples
Take 10 samples (yes, twice)
Correlate those 10 samples. If they correlate above 5%, trend towards failure.
But if they don't correlate above 5%, trend towards success.
Take one more sample and repeat, comparing the oldest and second-to-oldest until…
After 5 successes, call it good and move on. After 50 failures, assume the function will never be warm or free from error, and fail out (this may change to failing in some more interesting way later, or change so that the runner handles this specific failure and warns the user but proceeds.)
This is the first part of #31, and contains way more math stuff than it necessarily needs.
Warm up the VM by:
After 5 successes, call it good and move on. After 50 failures, assume the function will never be warm or free from error, and fail out (this may change to failing in some more interesting way later, or change so that the runner handles this specific failure and warns the user but proceeds.)
This is the first part of #31, and contains way more math stuff than it necessarily needs.