Closed zeenix closed 1 year ago
hmm.. could you check if making ObjectServer not on-demand helps?
In GitLab by @elmarco on Sep 20, 2021, 23:07
Since the server task is started immediately, even before we register a path (or a service name, with the modification I tried), there are always chances that we loose messages there. (with the sync API, I didn't have this problem, because the ordering was more in control, and we have/had queues iirc).
Maybe we need to a way to start/stop the server task...
Since the server task is started immediately, even before we register a path (or a service name, with the modification I tried), there are always chances that we loose messages there.
Hmm.. true.
with the sync API, I didn't have this problem, because the ordering was more in control, and we have/had queues iirc).
You would definitely have this problem if I hadn't insisted on having the request_name
API on ObjectServer
instead of Connection
.
Maybe we need to a way to start/stop the server task...
maybe and I'm actually thinking about that due to other reasons, but here maybe the solution is to go back to request_name
on ObjectServer
.
In GitLab by @elmarco on Sep 21, 2021, 10:39
with the sync API, I didn't have this problem, because the ordering was more in control, and we have/had queues iirc).
You would definitely have this problem if I hadn't insisted on having the
request_name
API onObjectServer
instead ofConnection
.
There is no point denying it was simpler to get it working reliably when everything was sync. The code has evolved since then, you are more familiar than I am with the async stuff. I don't recall you had to insist convince me much on that request_name change anyway.
Since the receiver task starts early, chances of losing messages are still high, regardless of when request_name or object server is setup.
Either everything must be setup before starting the task, or we need some queuing somewhere. I think I would rather have 1.
Maybe we need to a way to start/stop the server task...
maybe and I'm actually thinking about that due to other reasons, but here maybe the solution is to go back to
request_name
onObjectServer
.
Please explain, what race are you solving that way.
There is no point denying it was simpler to get it working reliably when everything was sync
So while you may be right about how things used to be simpler in your days, I don't see how the async changes are the culprit here. If you say those changes make it harder to solve this, maybe I could agree there but they're not the cause.
Since the receiver task starts early, chances of losing messages are still high, regardless of when request_name or object server is setup.
If name is requested after the service API is setup, there is no chance of missing messages. We even got rid of the overflow to ensure that in general. The only thing left that can still miss messages is property caching code, that I'm trying to also fix (!383).
I don't recall you had to
insistconvince me much on that request_name change anyway.
https://gitlab.freedesktop.org/dbus/zbus/-/merge_requests/332#note_974621
That was much much longer discussion than I wanted.
In GitLab by @elmarco on Sep 21, 2021, 14:16
https://gitlab.freedesktop.org/dbus/zbus/-/merge_requests/332#note_974621
That was much much longer discussion than I wanted.
Ok, but please don't "insist" but demonstrate: we didn't nail it apparently. So the discussion goes on.
In GitLab by @danieldg on Sep 21, 2021, 15:51
I would consider using request_name before setting up the ObjectServer to always be incorrect; request_name even references #68 in its code, though I don't see the referenced warning in the docs. I think this means that the prototype of that function should be &self
and not self
, to discourage its incorrect use in a builder pattern.
@danieldg Yeah, agreed. Also moving it back to the ObjectServer
, we can panic (or emit warnings once we've logging) if request_name
is used before any at
calls from the user (will need to be careful about excluding ObjectServer's own at
usage though).
In GitLab by @elmarco on Sep 21, 2021, 17:40
You could still have such warning, even if the request/release name methods remains on Connection. I still don't agree that should be ObjectServer methods, and you have demonstrated that it fits in Connection so far.
Furthermore, it's completely valid to have at() being called after request name is called, as you may want to have more objects added during runtime.
Instead, I think we should have logging/warning for dropping incoming messages, yes!
You could still have such warning, even if the request/release name methods remains on Connection. I still don't agree that should be ObjectServer methods, and you have demonstrated that it fits in Connection so far.
I guess if the methods don't implement the builder pattern (as @danieldg pointed out) and we change the book's server chapter to not mention requesting name before showing how to expose their interfaces, it'd already help avoid this issue.
Furthermore, it's completely valid to have at() being called after request name is called, as you may want to have more objects added during runtime.
Right, it's valid. I was mostly just thinking out loud about the panic part. :)
In GitLab by @elmarco on Sep 21, 2021, 21:15
Another painful similar case (while converting code that used to work, ..)
In p2p (not taking service name, of course), the msg receiver task starts immediately after build(), before I can setup the object server.
In GitLab by @danieldg on Sep 21, 2021, 21:30
Sounds like having the object server available in the builder would be a good idea.
Another painful similar case (while converting code that used to work, ..)
perhaps you should add good test cases for those so I'd know when I'm about to cause you pain?
In p2p (not taking service name, of course), the msg receiver task starts immediately after build(), before I can setup the object server.
The task only starts after the first time you access the ObjectServer
but of course currently you can only add one interface at a time and the task doesn't exactly wait for the first interface to be added either.
Sounds like having the object server available in the builder would be a good idea.
I think it's best what @elmarco suggested earlier here: explicit API to start (and maybe also stop) the task. Just thinking out loud: maybe give the task handle to the user so they can either detach()
it (which will give them mostly the current behavior), .await
it or do other things like join'ing it or select'ing it with other futures (think timeout).
mentioned in commit 9f32190e83f51ecc9c9b33ddac00e532d4633557
How about this plan:
I think that'd be a good starting point and will allow us to add future API later as needed w/o breaking API. e.g wait_timeout()
when we've the mechanism in place to monitor ObjectServer traffic.
In GitLab by @elmarco on Sep 20, 2021, 22:44
I observe an activation bug (somewhat reminiscent of #68), with a system activated service on fc34:
I get a reliable error on activation (subsequent calls ok while the service is running):
Interestingly, if I modify zbus to request the name after the server is setup, I don't get the race or the issue or the timeout from #68.