Closed SantiagoRubioDev closed 8 years ago
You're thinking in a very promising direction, and applying existing maths to the problem is a good way of getting "jump ahead" like we had with the FFT.
However, you're also thinking too much in terms of the reals (very much an EEE view :), and unfortunately doubles won't behave enough like reals, even if you can get out the bits.
Notice that the recursion is:
x_{i+1} = x_i * a + b mod 2^32
so it is not over the reals, it is over integers modulo 32. The problem is then how to do 1/(a-1) modulo 2^32. It can be done, but might be more complex if you haven't done modulo rings (which I don't expect you to have).
Instead, you might want to think of it in matrix form. If you were to think in terms of a column vector:
| x_{i+1} | = A | x_i |
| 1 | | 1 |
What matrix would allow you to get from one vector to the next. Once you have that matrix, how can you jump n steps in less than n operations?
@m8pple Would I be getting warmer and colder for matrix solution if I was looking at https://en.wikipedia.org/wiki/Exponentiation_by_squaring? #logn
That does indeed look appropriate.
The difficulty is then balancing the cost of a slow jump over a long distance, versus the very fast but sequential single-step recurrence. Same problem as in FFT - you need chunks big enough to amortise the cost of the chunking, but not so big that there is no parallelism.
I should note that this isn't the only solution, just in case anyone sees this and thinks this is the answer. There are other interesting SW+GPU hybrid approaches that doesn't need the fancy skipping maths, and are technically less parallel, but might be better in practise (we live in a world of finite n).
The seed in computed recursively throughout the code which makes it inconvenient for parallelism. I tried finding a function that could give me the value of the seed at any iteration. F(x+1)=aF(x)+b with F(0)=c ==> F(x)= (pow(a,x)((a-1)*c+b)-b)/(a-1) the problem is that I cannot use integers anymore because we have a exponential and a division. I used double instead for each variable a,b and c and then cast everything to uinit32_t. My problem is that when I cast a very big double (>10^19) to uint32_t I get 0. I guess that is due to how doubles are implemented in IEEE: the & operand does not work on double, so I haven't found a way to select specific bits. How would I do to explicitly only take the bits that interests me to make a 32 bit number? Am I on the right track or am I trying to do something that cannot work? Should I abandon this idea and try to find another way to have parallelism?