eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
230 stars 131 forks source link

crypto hardware acceleration support #238

Open nicolasb565 opened 5 months ago

nicolasb565 commented 5 months ago

I tried using netxduo for a https server on a stm32h5 mcu. It takes over 8s for the tls 1.2 handshake. The client send multiples retry before we have time to respond, and overflow our eth rx buffer.

I would like to have hardware acceleration support, for example hooks that can be called to provide function pointers. I want hashing and encryption/decryption to be supported. Those functions pointers would be used if not null instead of software implementation. Or instead we could have the software implementation be weak symbols, and then we only need to define the implementation we want and it will get used instead.

Hnz2 commented 5 months ago

Hi nicolasb565,

I don't have good news for you. STM32H5 does not have RSA, ECC hardware accelerator. It have hardware cryptography accelerator for AES, DES and hash accelerators. But these accelerators can't improve TLS connection establishing time. It can improve transmission speed when connection is established already. I have deal with same issue with STM32H7 from this reason I can provide you few tips what you can do to improve you TLS connection time:

With this settings I was able to achieve TLS establishing time at STM32H735@520MHz approximately 300ms. And this is acceptable for me. Maybe execution of code from ITCM RAM can improve this. But I haven't opportunity test this.

Regarding using hardware accelerators for AES, DES. This can significantly improve transmission speed. At least 10x. I have at my STM32H7 https tx speed approximately 300kByte/sec with software cryptography (without TLS more than 80Mbit/sec at 100MB network). Years ago were promised by ST that hardware acceleration for NetX secure and STM32H7 will be implemented. Unfortunately this will be never implemented. At ST was literally decided to drop support for Azure RTOS / Eclypse ThreadX. Because they don't believe for community support at Eclypse ThreadX. I have try for few days add support for hardware acceleration into NetX secure. But without success. I found that this is not easy task. Design of ST HAL complicate this task. As reference I used Renesas NetX crypto implementation.

Jan

nicolasb565 commented 5 months ago

Hi hnz2,

We are using STM32H563, but STM32H573 support rsa hardware acceleration so maybe we could use that one instead, they seem to be from the same family but H573 has hardware accelerated crypto.

Those numbers are good to know, are you using a special compiler/flags to get that king of performance? We are using gcc from stmcubeide. I am going to look into using stm crypto libs to see if it's much faster than microsoft's implementation. Our mcu is at 250Mhz which is the max advertised frequency.

What kind/size of key are you using for tls? We are using rsa 2048 bits key.

Hnz2 commented 5 months ago

Hi nicolasb565,

I did not know that at STM32H5 are devices with PKA (Public key accelerator) peripheral. If you will be able change device and use this accelerator, I think this will significantly improve your TLS connection time. But I am not sure how easy this can be implemented into NetX secure.

Regarding compiler. I talk about changing from -O0 to -Og. This is enough to significantly improve connection establishing time in my case. I use RSA 2048 as well.

CRYPT peripheral at STM32H7 supports HASH and cryptography (AES, DES). It does not support PKA. I have done test with AES-128-GCM and HAL lib implementation. Encryption was at leat 10x faster. But I have issue implement this into NetX secure. Main issue for me was properly do padding with HAL library. But if you will be able successfully implement cryptography acceleration with HAL, let me know...

Jan

hwmaier commented 5 months ago

This is an interesting discussion and like to share some thoughts on this topic.

The effort which is required to implement hardware accelerated crypto routines must not be underestimated. I have been involved in some testing and analysis of a vendor specific NetX Crypto implementation and gained insight in the complexity.

Adaption of cryptography routines is not for the faint hearted. It requires a lot of expertise and understanding how cryptographic algorithms work, their specific modes and how they are supposed to be implemented. It is easy to make a mistake and end up with a security vulnerability.

Given the requirement of specialist know how and man hours required, I cannot see how this can take place without the support and commitment of the MCU vendor.

But If MCU vendors are not prepared to step up and provide implementations for crypto and network hardware, then I also cannot see a bright future for NetX.

A possible path forward could be a porting layer where NetX Crypto would be using mbedTLS under the hood. Most MCU vendors have hardware accelerated implementations for mbedTLS and are maintaining that implementation.

Hnz2 commented 5 months ago

Hi hwmaier,

Thank you for valuable comment. I am glad that you confirmed my observation that implementation of hw acceleration into NetX Crypto is not easy task. And my decision give-up implementation for this moment was a good decision.

btw... ST employee at ST community forum said that replacing sw to hw implementation is "usually a trivial matter."

hwmaier commented 5 months ago

btw... ST employee at ST community forum said that replacing sw to hw implementation is "usually a trivial matter."

"Trivial matter" is an interesting assessment which I can't share. You have to understand the encryption modes (even though they all use AES under the hood, the modes all work very differently), deal with padding, deal with counters, deal with onces and hashes, handle the private encryption keys, deal with block alignments matching the machine's crypto hardware's alignment, deal with chaining of the blocks and so forth and so forth.

nicolasb565 commented 5 months ago

When using 02 optimization, performance is pretty acceptable. I get about 1s for tls connection. With 0g it takes 1.5s Only issue is that sometime eth rx dma stop working if there are too many incoming packets. But at 1s this works most of the time because the number of tcp retry is low enough for it to work most of the time. With 0g at 1.5s I get too many tcp retry so it does not work. I will need to fix this eth rx dma bug.

yuxinzhou5 commented 5 months ago

@nicolasb565 Glad you are able to reduce the TLS connection time down to 1s. I might be able to help you with the Ethernet problem you have been looking at. Feel free to contact me yzhou@px5rtos.com.

yuxinzhou5 commented 5 months ago

@hwmaier NetX TLS is designed to plug in different crypto algorithms. In 2021 or 2022, we benchmarked NetX Crypto against MbedTLS crypto library (running on STM32H7, using IAR compiler). NetX crypto was 5-10% faster than mbed, including RSA algorithm. Plus NetX Crptyo was FIPS140-B certified. As Eclipse foundation now takes over the code base, I hope all the technical advantages can be maintained.

hwmaier commented 5 months ago

@yuxinzhou5

I hope all the technical advantages can be maintained.

So do I. But this is something the foundation has been very quiet and vague about how this can be achieved. Who takes over the code base maintenance once Microsoft completes the handover to the Eclipse release?

yuxinzhou5 commented 5 months ago

@hwmaier As far as I know, Eclipse foundation is in the process of setting up a ThreadX working group which will make all the technical and business decisions. Before the working group is set up (and the process is better understood), we at rtosx.com will try to help the community with technical questions as much as we can.

nicolasb565 commented 5 months ago

What worry me is that stm told me we don't have the latest security updates. They said that it was fixed in 6.3.0 and we have 6.2.0. Now we are supposed to wait until eclipse foundation take over.

yuxinzhou5 commented 5 months ago

What worry me is that stm told me we don't have the latest security updates. They said that it was fixed in 6.3.0 and we have 6.2.0. Now we are supposed to wait until eclipse foundation take over.

@nicolasb565 Did you receive NetX from ST, as part of their firmware release? Going forward, how ST (or any MCU vendors) packs ThreadX into their distribution is really an Eclipse/ST(or MCU vendors) question. just by looking at this repo, the current version is 6.4.0. Hope this gives you enough information to pick up the bug fixes?

nicolasb565 commented 5 months ago

@yuxinzhou5 Well, it's just that our ide auto generate a bunch of code so it's not very practical. That include threadx/netxduo/usbx/usbpd. All of that is integrated together. Anyway our product is not released yet so we are going to make sure everything works fine first. I also don't get to decide on priorities, but at least we do know about the issue.

nicolasb565 commented 5 months ago

The issue with eth rx buffer was caused by insufficient ip stack thread priority. The ip stack need to be higher priority than the webserver. For threadx, that means a thread priority that is smaller because smaller is higher priority.