Xilinx / Vitis_Libraries

Vitis Libraries
https://docs.xilinx.com/r/en-US/Vitis_Libraries
Apache License 2.0
860 stars 347 forks source link

[security L1 lib] operation performance is not good #60

Open lmaxeniro opened 3 years ago

lmaxeniro commented 3 years ago

Hi I am writing an HLS test program by using the ecdsa relevant library, for exp, the signature verification: xf::security::ecdsaSecp256k1.verify I am doing some benchmark tests for capturing the timing profiling.

What I noticed right now, is that this library performance (throughput) is not good, for exp, under HW emulation mode it would take more than 1 hour to finish one round signature verify simulation (I did not finish that so don't know the exact timing), from such simulation result it turns out the real performance on an HW will not be good as well (probably >30ms / iteration) -- same performance like an MCU.. From HW Emulation it looks like most of the timing spent on group operations, likewise grp_modularInv, grp_productMod, grp_add, tens of thousands of the clock cycle were consumed there.. Is there some way to further improve that? otherwise, this library looks rather useless to compare to other kinds of solutions. I checked the library functions of modular and group add ect.. there is no HLS pragma there--could that work better if I put some HLS loop optimization pragma there?

sig_verify

lmaxeniro commented 3 years ago

Hi I am using the test bench here and run through the CSIM/CSYN/and RTL Co-SIM, I get a final report that the freq can be upto ~210MHz and the clock cycle for task run will be about 6.1E6, which give 30ms / iteration. I had tried a few optimizations but that generally does not work-- is there any suggestion?

vt-lib-support commented 3 years ago

Hi @lmaxeniro , we have identified this issue and will make changes to fix this issue and update. It will greatly improved its performance and reduce resource at the same time.

Basically, given the elliptic curve, like secp256k1, its parameters are all fixed. Then it is possible to make several calculation much easier, like modularInv.

lmaxeniro commented 3 years ago

@vt-lib-support Thanks a lot for the feedback. Is there a rough schedule for that upgrade release? This will help me to make judgement how can I move forward.

vt-lib-support commented 3 years ago

@lmaxeniro , if everything is OK, this update will be ready next week.

lmaxeniro commented 3 years ago

Good to know that. Thanks for the timely update!

pmpakos commented 3 years ago

@lmaxeniro , if everything is OK, this update will be ready next week.

Excuse me for the interruption, but will this update include all parts of the library? More specifically, I am interested in "sparse" part of your libraries. Thank you! :)

vt-lib-support commented 3 years ago

@pmpakos , this will be part update, not a new release. Maybe you could post your concern about sparse library in a separate issue so we could better track it.

lmaxeniro commented 3 years ago

@pmpakos I am sorry, but I don't understand what your are talking about "sparse" part-- what I am doing test with is the same code here: https://github.com/Xilinx/Vitis_Libraries/blob/2020.1/security/L1/tests/ecdsa_verify/test.cpp

lmaxeniro commented 3 years ago

@vt-lib-support Can you please let me know if there is already an upgrade available for this issue, or shall I wait for more time?

vt-lib-support commented 3 years ago

@lmaxeniro , the update will be ready in several hours.

lmaxeniro commented 3 years ago

@vt-lib-support sorry I did not see the release update on git yet, is that done or not?

vt-lib-support commented 3 years ago

@lmaxeniro , please checkout https://github.com/Xilinx/Vitis_Libraries/commit/c2b0900e168584dd0a49336d001cbd037290446f

lmaxeniro commented 3 years ago

@vt-lib-support sorry I missed that as I am checking on the branch 2020.1. I will check that and feedback if that solves my problem. Thanks a lot for the support!

lmaxeniro commented 3 years ago

@vt-lib-support I run a test on this release, by running a co-simulation I get a Fmax=160MHz (which is poorer than the previous version ~210MHz), and the clock cycle is about 2.8e6, which is about 54% better than the previous release (6.1e6 clcok cycle). This does not likely a significant improvement though.. Do you have any comments about this result?