Hi, thanks a lot for trying to improve my article! I apologize for the late reply; I've been travelling. The published algorithm is correct (as far as I can tell). It's also how I implemented fast repetition for the tools. Maybe you're confusing it with the algorithm which evaluates the bits from "left to right". The algorithm you wanted to change, however, considers the least significant bits first. This is why we have to square/double a separate variable and then "mix it into the result variable" at the right places/magnitudes. (As I referenced in the article, there's a similar algorithm on Wikipedia).
Hi, thanks a lot for trying to improve my article! I apologize for the late reply; I've been travelling. The published algorithm is correct (as far as I can tell). It's also how I implemented fast repetition for the tools. Maybe you're confusing it with the algorithm which evaluates the bits from "left to right". The algorithm you wanted to change, however, considers the least significant bits first. This is why we have to square/double a separate variable and then "mix it into the result variable" at the right places/magnitudes. (As I referenced in the article, there's a similar algorithm on Wikipedia).