J08nY / pyecsca

Python Elliptic Curve Side-Channel Analysis toolkit.
https://pyecsca.org/
MIT License
56 stars 15 forks source link

A formula call is slow #11

Closed scrambler-crypto closed 1 year ago

scrambler-crypto commented 3 years ago

Trying to use this framework to generate a set of guess and try to reverse a scalar mult algorithm I am studying. The call of a formula (call) is very time consuming. Is there a way to optimize or cache the formula and speed up the computation?

As context I'm calling: some mult.multiply(guess) and some mult._add(Point_a, Point_b). Am I correct? or is there a better way to get a fast multiplication and sum of two points?

Thanks in advance.

J08nY commented 3 years ago

Hey @scrambler-crypto, The formula execution in pyecsca was not really optimized for speed and is currently kind of a bottleneck. This is due to many things I suspect the following are the biggest culprits:

Could you provide some more details on the code you are running and the performance you are getting? Just to make sure that you are not hitting some weird issue that seriously degrades performance and could be fixed. Something like "I'm running addition formula calls over this curve and they take x milliseconds each".

TODOs for me:

scrambler-crypto commented 3 years ago

Hi, @J08nY thanks for the reply. I installed gmpy2 since I saw it in the code and appreciated the improved speed. I also understand that this project has not the speed as its first aim, but I really like it because of its "malleability", I can represent a very large number of combinations of formulas.

I do not have precise numbers to answer but, roughly to compute with known and all took about 10 minutes on secp128r1 with projective coordinates (I forgot to say that I did that scalar mult for several (few hundreds) of Points P).

I sped it up a little coding a "partial multiplication" function that reuse the result and continue the scalar multiplication.

J08nY commented 3 years ago

I am working on performance evaluation and also some quite nice performance improvements in the Mod implementation. I will look at profiling the whole scalar multiplication and what takes the most time there. Expect improvement soon 🙂.

scrambler-crypto commented 3 years ago

Hi, I am very interested in the performance improvements, and I would like to contribute, if you have some tasks that are simple enough for me.

J08nY commented 3 years ago

Now there is a framework for benchmarking pyecsca (run make perf but make sure you installed the new dev dependencies as defined in setup.py) it is possible to analyze the runtime. I have only added benchmarks for Modular arithmetic, formula calls and a basic scalar multiplier, without getting into the Context and tracing stuff which I expect to be a significant overhead.

Currently it seems the largest overhead is various validations, type checks and casts around the Mod class. Its a tricky class that is able to gracefully handle things like 5 * Mod(3, 7) -> Mod(1, 7) by casting the 5, but these kinds of runtime casts and checks do create some overhead, also the check for the same modulus (such that Mod(5, 7) * Mod(5, 11) raises an exception) also creates a bit of an overhead. Some of these checks can be removed, but some must stay for the time being. I am thinking of a few speedups where the formulas could be pre-processed such that no raw integers remain in them and then these casts will be unnecessary. When looking at the formula calls and scalar multiplication, the overhead from tracing the Actions is obvious where out of 30 seconds of operations, like 5 is spent in Action handling code, even with the NullContext set. This is could be solved as suggested in my last point on my TODO list above.

I have also implemented a few speedups in the modular arithmetic of both the GMP and raw Python Mod implementation (square roots and is_residue computations should be much faster now).

Long-term I am thinking of moving performance critical parts of pyecsca (modular arithmetic, formula execution, context and tracing) to use Cython, which could provide a speedup.

J08nY commented 1 year ago

28 implemented a bunch of things mentioned here and is about what I can optimize for the time being, so closing this.

J08nY commented 4 months ago

66 Improves even more, there is now maybe a x5 speedup to previous performance.