Closed roastedpork closed 6 years ago
The goal is to make 'Execute' run faster, so i would ask: What is the difference between the two options you identify? (Every student taking this course should know enough stats to be able to analyse this problem.)
This is a pretty common type of problem, where there are many correct answers, but the difficulty is in how fast you can find any correct answer. On the one hand it is simply a question of brute force hashing rate, but there is also the issue of returning a correct answer as soon as it is found. There is a certain tension between the two...
I would say that the first case is much more difficult to achieve, since it would require at least a 10x improvement so that the slowest run of our implementation is faster than the fastest run of the reference.
In hindsight, I think I was misled by the programme "comparing" our result to the reference result, when all that really matters is that we obtain any result that meets the criterion.
This didn't occur to me until just now, after some thought. But since you'll be judging our performance purely by execution time, and given that our results can be quite variable even for the same input, what would be the exact method used to assess this puzzle?
There is a lot randomness there, so for any one execution it is very difficult to tell whether one method is better than another. Someone might sequentially try just a few points and get an answer quickly by luck, while someone else might try 10000 points in slightly more time, and appear slower. But on average the faster miner will win, so across a spectrum of scales it is clear which is better. There is also the option of looking at integrated time for a sequence of problems, rather than individual problems.
If we assume that each miner has a startup cost s
and a hashing rate h
, then
we get:
s
s+n*h
s+n*h/2
That is likely to be a very good model for most miners, so if we wanted to
extract the h, we could take n_1,n_2,...,n_k
, and measure t_1,t_2,...,t_k
.
You've then got a fitting-with-noise problem, where the only unknowns
are s
and h
. So something like log-likelihood estimation or even just
least-squares can be brought into play.
(Not that I actually do that - integrated time is a good enough proxy, but you could quite accurately infer true hashing rate if you wanted).
While working on the mining puzzle, I noticed that the performance of mining tends to vary by quite a lot. So I tried timing 5 runs of the reference execution with the exact same input:
The reference execution can be as fast as 2 seconds to as slow as 17 seconds in this case. I suspect this is because the seed value for the RNG is different for each execution, as seen in
std::mt19937_64 iter(time(0));
. Are we expected to improve the original code such that our improvement is significantly better even though its result may be variable as well? Or are we expected to measure the average execution time of our programme?