Open smartmx opened 2 years ago
The RISC-V platform is quite terrible when it comes to efficient big integer multiplication due to the lack of a carry flag and instructions like umaal
.
There is a very long thread discussing this at https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/Prak1SLbys8. You will find one post explaining how to rewrite one umaal
instruction using eight instructions. If you really want to port this project to RISC-V, that could be one way to make that possible.
The "solution" some vendors offer is a hw accelerator for either big integer multiplication, or a hw accelerator directly for ECC calculations. Espressif use this for their RISC-V devices, as an example.
In my opinion both MIPS and RISC-V instruction sets are inferior compared to ARM instruction set. I guess RISC-V is used only to avoid the license fee of ARM.
Thanks for your help! I will try it.
Hi, Emill.
I have some questions.
In function P256_mul64: Your comment is : / in: (t0,t1) = a[0..1], (a2,a3) = b[0..1] out: a0-a3 /
Is t0 the High uint32 of data or t1 was? t0 = (data>>32) or t0 = data&0xffffffff ?
Is a2 the High uint32 of data or a3 was? a2 = (data>>32) or a2 = data&0xffffffff ?
the out data a0-a3, which of them is the high uint32 of data?
the out128 data = a0<<96| a1 <<64 | a2 << 32 | a3 or the out128 data = a3<<96| a2 <<64 | a1 << 32 | a0
Thanks!
Sorry, above comment has some error. That is porting on RISV-V......
In function P256_mul64: Your comment is : / // in: (r4,r5) = a[0..1], (r2,r3) = b[0..1] // out: r0-r3 /
Is r4the High uint32 of data or r5 was? r4 = (data>>32) or r4 = data&0xffffffff ?
Is r2 the High uint32 of data or r3 was? r2 = (data>>32) or r2 = data&0xffffffff ?
the out data r0-r3, which of them is the high uint32 of data?
the out128 data = r0<<96 | r1 <<64 | r2 << 32 | r3 or the out128 data = r3<<96 | r2 <<64 | r1 << 32 | r0
Thanks!
the out128 data = r3<<96 | r2 <<64 | r1 << 32 | r0
Everything is little endian, so lower named registers contain lower bits.
Note that risc-v has the umulh
instruction, which you should use to get the high bits of a 32-bit multiplication. Cortex-m0 lacks such an instruction which results in this quite large workaround as you can see.
Yes, I am Using mulhu
to make P256_mul64 too much easy.
About 25 lines of codes.
Thank you.
I'm trying to adapt this project to the RISC-V platform, but RISC-V doesn't have overflow handling when computing. It means that the assembly on Cortex cannot be simply replaced and then run on RISC-V. My current idea is to create a global variable for storing the overflow flag, but this will greatly reduce the calculation speed. Do you have any good solution? Thanks!