Closed jiangxiaosheng closed 1 year ago
Hi Sheng. Thanks for raising this question.
You're right that the documentation doesn't do a great job at explaining the expected thread model. I'd left a brief note, but it's not enough since reading the doc for enqueue_request
gives no indication of any constrants: https://github.com/erpc-io/eRPC/blob/094c17c3cd9b48bcfbed63f455cc85b9976bd43f/src/rpc.h#L65 I'll try to write something for each public API function in the coming days. If you'd like to make a PR for the changes to rpc.h
docstrings, it'll be appreciated.
To answer your question: no API functions, except the server-side background threads which the application doesn't control, are thread safe in the current implementation. This means that the same client thread that calls enqueue_request
must poll for its completion.
Regarding your use case: the computation logic for congestion control and retransmissions is quite simple and does not significantly contribute to tail latency. If we split these out of other RPC processing into a separate core, the cross-CPU core coordination will hurt far more. If you're interested more in this topic, this paper presents some great material: https://sigops.org/s/conferences/sosp/2013/papers/p33-david.pdf.
Thanks Anuj for the quick reponse and clear clarification! Yes I’d like to make a PR adding more comments regarding the thread model
Thanks for providing this fast RPC framework! I recently started to program with eRPC and found some necessary clarifications of the thread model are missing as far as I saw which can produce bugs that are hard to trace.
The event loop must run in the same thread where the Rpc endpoint is created, otherwise the rpc request will be just lost, i.e. the server will never receive that request. There is no runtime error reporting such a situation since the check is done in an
assert
statement in the source code which is ignored in release build.After fixing this bug, I let a server-side background thread run the event loop, and on the client side, it only runs the event loop after an
enqueue_request
call as in the hello_world example. However, if I take the same approach as the server-side program, i.e. spawning a background thread busy looping and the client just enqueuing request, the rpc request is lost again.A simple program that illustrates my approach (I only did minor modifications to
client.cc
in the hello_world example):The way in the
client.cc
could work. I was just thinking running the event loop in the client's background thread can help to reduce the tail latency because other work like congestion control, re-transmission (if eRPC provides such mechanisms) can be handled in the background.I guess it might be due to the
enqueue_request
call is not thread-safe in the user thread. So is it a best practice to run the event loop reactively on the client side as the hello_world example does? I would really appreciate it if anyone can clarify my question.