eclipse-californium / californium

CoAP/DTLS Java Implementation
https://www.eclipse.org/californium/
Other
729 stars 367 forks source link

[3.7.X]Allow Exchange not to use SerialExecutor #2091

Closed JimmyBaize closed 1 year ago

JimmyBaize commented 1 year ago

My Application Scenarios: high-performance CoAP over TCP Server In large-capacity and high-performance application scenarios, We don't want to switch threads during message processing, because of the SerialExecutor of Exchange switching threads causes performance loss. It is expected that threads will not be switched during CoAP decoding, message forwarding (such as using async http client), and CoAP encoding.

In version 2.X, we can set null to Exchange.executor to not switch threads. but this method is restricted in version 3.x https://github.com/eclipse-californium/californium/blob/aec3f78c4ffa85460850901d69af1e14c1c28a1a/californium-core/src/main/java/org/eclipse/californium/core/network/Exchange.java#L364-L372

https://github.com/eclipse-californium/californium/blob/aec3f78c4ffa85460850901d69af1e14c1c28a1a/californium-core/src/main/java/org/eclipse/californium/core/network/Exchange.java#L623-L625

So, can we give the API more freedom? Like version 2.X , allow not to checkOwner . Do not force use SerialExecutor https://github.com/eclipse-californium/californium/blob/5220bb08301a3b13e8b9bef1558718b19be821ca/californium-core/src/main/java/org/eclipse/californium/core/network/Exchange.java#L1145-L1149 If can, I'd like to submit a PR

boaks commented 1 year ago

Do you have any benchmarks? What are the results, which speedup to you get?

Without the serial executor, parallel processing applied to the same exchange causes failures, mainly for the observe/notify and blockwise parts. At that time, when the SerialExecutor was introduced, the failures includes also serious memory leaks. For 2.x the null was used for unit tests .

Anyway, if you don't care about potential failures and leaks (or handle them on your own), and if you have results, which justify such an change, I will think about the best way to do it. But first, please results.

JimmyBaize commented 1 year ago

Do you have any benchmarks? What are the results, which speedup to you get?

Oh, Yes, I was going to upgrade from 2.X to 3.X, but haven't done it yet due to compatibility issues. So I didn't do a performance test comparison. But in theory, Java thread switching does cause a performance loss.

I use californium as a message forwarder (like nginx or Vert.x), so I expect to use californium to implement a high-performance COAP forwarder, like Vert.x (also using Netty). Vert.x only forwards messages and does not switch threads during encoding and decoding. When Netty is used, frequent thread switchovers are usually avoided because the Netty I/O thread naturally ensures that each TCP channel message is processed in serial. We do not need to change the thread pool multiple times. We only need to complete the protocol layer processing in the Netty I/O thread.

When Netty is used, can the thread pool not be switched between the two places? https://github.com/eclipse-californium/californium/blob/aec3f78c4ffa85460850901d69af1e14c1c28a1a/californium-core/src/main/java/org/eclipse/californium/core/network/CoapEndpoint.java#L1027-L1046

https://github.com/eclipse-californium/californium/blob/aec3f78c4ffa85460850901d69af1e14c1c28a1a/californium-core/src/main/java/org/eclipse/californium/core/network/CoapEndpoint.java#L718-L728

This may be a big change, but is it possible to open the API to allow user to implement specific requirements by using californium.Stack californium.Matcher californium.layer to implement Endpoints themselves?

boaks commented 1 year ago

But in theory, Java thread switching does cause a performance loss.

All theory is gray

For the UDP part the idea, to reduce the thread switching in order to gain performance was raised a couple of times in the past years. And always failed to verify the speedup. Therefore, please verify the speedup ahead.

I use californium as a message forwarder (like nginx or Vert.x), so I expect to use californium to implement a high-performance COAP forwarder, like Vert.x (also using Netty). Vert.x only forwards messages and does not switch threads during encoding and decoding.

Not sure, what you really want to do (and who should "fulfill" your expectation). For me it sounds wired. Californium is a full CoAP stack, it processes the messages according RFC 7252, 7641, and 7959. RFC 8323 is still experimental and mainly supports the different message format. If you only want to forward the CoAP message "as it is", without applying CoAP functionality, you may implement something similar as an endpoint, which mainly applies the parsing of the messages and then do what every your forwarding requires.

Netty works that way, because the other functions are designed for that. Californium's Exchange and CoapStack aren't designed for that.

When Netty is used, can the thread pool not be switched between the two places? This may be a big change, but is it possible to open the API to allow user to implement specific requirements by using californium.Stack californium.Matcher californium.layer to implement Endpoints themselves?

"Open the API" will not be the big thing, except you expect, that the other functions will then work with other threading models as well. That will be clearly someone else tasks to make it working.

So again: Before we spend too much time in this idea: verify the speedup.

After hat, clarify, which functions do you really want for the forwarding. e.g. If you replace the processing stack by a "reduced stack", then exchanging the execution may have a chance, as long as you spend your time in that "reduced stack" yourself.

boaks commented 1 year ago

Any update?

boaks commented 1 year ago

Don't hesitate to add a comment or open a new issue, if you're interested again.

boaks commented 1 year ago

See PR https://github.com/eclipse-californium/californium/pull/2153 about null as Executor for Exchanges.

boaks commented 1 year ago

Using the

BenchmarkClient with 1000 concurrent clients with CON requests and

ExtendedTestServer

still doesn't show any performance gain when disabling the SerialExecutorand the CoapEndpoint.execute(final Runnable task).

There maybe other setups, which shows something, but for now 4 years no one demonstrated the performance benefit of switching the execution model.

FMPOV, I will keep the "experimental" DUMMY_EXECUTOR but without spending time in developing that further.

JimmyBaize commented 1 year ago

Okay, DUMMY_EXECUTOR is very good