Open brew0722 opened 8 months ago
I found related docs as following, and I maybe understand why montgomery optimization based on arm64 ISA is difficult. Unless the same instructions are provided as Intel ADX and BMI2, assembly optimization probably won't help much.
However, I don't know if this conclusion is accurate due to my limited knowledge of cryptography, so please close the issue if there are no other opinions after the final review.
I am developing a program using arkworks' groth16 snark library. Proof verification benchmark performance results were sufficiently fast in the local development environment, but very slow performance results were observed in the embedded environment.
As a result of using the profiler tool, most of the overhead occurred in
ark-ff
's field arithmetic operation (mul_assign
). The current arithmetic implementation ofark-ff
appears to have inline assembly optimization only forx86_64
.The embedded environment uses
arm64
architecture and has low-performance hardware such as Raspberry Pi. Of course, low hardware performance is the main cause, but considering the generic mobile environment, I thinkarm64
optimization support is necessary.I would like to ask if you have any plans to support the arm64 arithmetic optimization.