Open budrus opened 3 years ago
@budrus iceperf already does it more or less this way. One appliacation only sends data after it received data from the other, so we already have a blocking wait for the publisher.
@elBoberido That's somehow right. But only because we have the ping-pong back channel. For a throughput measurement I think other middleware doing a sent as fast as possible and check if everything was received and how much in a fixed time
. So I would prefer to also have a setup close to what people do when testing iceoryx. This is send like crazy as fast as possible and see if there is any data lost and what throughput it has.
@budrus okay, with this approach there will almost certainly be a data loss when the publisher is not blocked since the queue sizes are way to small to hold enough samples even if the OS stops the subscriber only for a few milliseconds.
I would suggest the following approach.
blockingPush
, timedPush
, tryPush
as pendant to blockingPop
, timedPop
, tryPop
I_WANT_IT_ALL
DONT_STOP_ME_NOW
or is set to THE_SHOW_MUST_GO_ON
This is great. Anything which doesn’t force or imply a threading model on clients is awesome.
Hackathon:
@budrus with an impossible task
@budrus with an impossible task
@ithier at least you realized that I got the hardest job
@budrus regarding the issue with stopping an application with a blocked publisher. I think there are two options
What do you think? Do you have another idea?
@budrus regarding the issue with stopping an application with a blocked publisher. I think there are two options
* option 1 * use a flag in the runtime set by the signal handler * use that flag in the keep alive thread to send a IPC message to RouDi * RouDi disconnects the publisher ports
I think I prefer option 1 as we should avoid to duplicate the publisher list. However, I have mixed feelings about this topic. It feels very hacky in a way. Would it be possible to only do this fix it on the release_1.0
branch and solve it on master
altogether in #611 with our new concept for object creation in shared memory?
@budrus regarding the issue with stopping an application with a blocked publisher. I think there are two options
* option 1 * use a flag in the runtime set by the signal handler * use that flag in the keep alive thread to send a IPC message to RouDi * RouDi disconnects the publisher ports
I think I prefer option 1 as we should avoid to duplicate the publisher list. However, I have mixed feelings about this topic. It feels very hacky in a way. Would it be possible to only do this fix it on the
release_1.0
branch and solve it onmaster
altogether in #611 with our new concept for object creation in shared memory?
Yes, it's kind of hacky but I wouldn't do it in only in the release_1.0
branch in order to keep the branches in sync for as long as possible and also to not keep this regression in master.
This is the make it work -> make it beautiful -> make it fast cycle ;)
@elBoberido @mossmaurice. I would also vote for option 1. I fear that ugly things could happen if we do another bookkeeping. having a runtime.shutdown()
that sends a command over UDS and ends up in doing the things on RouDi side you made to solve the challenge their is maybe the best for now. So we have a bit of reuse. Setting something in an individual publisher to release it feels even more ugly and more ideas I do not have
@elBoberido @mossmaurice @MatthiasKillat @elfenpiff I think this runtime.shutdown()
could also be used once we extend the WaitSet
/Listener
to be used without an explicit shutdown trigger. The runtime could then automatically register a signal handler for this and if the user wants a custom signal handler she must also call runtime.shutdown()
. I think this could also be used to simplify our examples even more, like
int main() {
auto& runtime = Runtime::init(...);
Publisher<uint8_t> pub{...};
while (!runtime.shutdownRequested()) {
pub.publishCopyOf(42U);
runtime.sleep(...); // this would be interruptible by runtime.shutdown()
}
}
or with the listener
auto& runtime = Runtime::init(...);
Listener listener;
Subscriber<uint8_t> sub{...};
listener.attachEvent(sub, ...);
runtime.wait(); // blocking wait which will be unblocked by runtime.shutdown()
What do you think? Shall I create an issue for this?
@elBoberido it's getting closer to while(ros::ok())
;-) I like the idea. As you wrote, it should be optional if our runtime does the signal handling. E.g. if iceoryx is used in ROS we would not have the signal handler on our side but do a runtime.shutdown()
in a place like rmw_shutdown()
Go for it!
@elBoberido So would you propose to leave it for now for Almond and we create a new issue and close this one?
@budrus I would create the runtime.shutdown()
method with option 1 for unblocking the application at shutdown and an issue for the other stuff like runtime.registerShutdownSignalHandler()
, runtime.ok()
, runtime.sleep()
and runtime.wait()
.
I'd like to have the unblocking of the blocked publisher in 1.0 and all the other stuff is nice to have and can be implemented later on. This could also be a good first issue for new contributors.
Brief feature description
Today our default is to use an "overflowing queue" for the subscribers. If the subscriber does not consume fast enough we start loosing samples. An option would be nice to block the publisher in this case for ensuring that no samples are lost.
Detailed information
The overflowing queue starts to drop the oldest sample in case of an overflow, so technically it behaves like a ring buffer. In many use cases this is fine as we want to have a "provide the last X samples" contract. E.g. if a subscriber is only interested in latest greatest data, they can set the queue size to 1 and we don't waste memory chunks with samples that are not interesting for the subscriber. We often also do not want to have an interference from a subscriber back to a publisher. So if the subscriber is not fast enough to consume all samples solutions could be
decrease the runtime for the subscribing application But there also might be use cases where it is fine to slow down the publisher to ensure that no data is lost in our system. The solution would be to block the
publish()
call when we detect a queue overflow until the subscriber popped samples and there is again a free slot in the queue. Sure, this has an influence on the publishing applications ans also other subscribers that are connected to this publisher. This is comparable to the DDS history QoS KeepAll. The normal behavior with our overflowing queue is comparable to the DDS history QoS KeepLastXToDo
When implemented implement the following integration tests:
ChunkDistributor
https://github.com/eclipse-iceoryx/iceoryx/pull/663#discussion_r606655415TriggerQueue
https://github.com/eclipse-iceoryx/iceoryx/pull/663#discussion_r606653889Ctrl+C
on an application with an publisher blocked by a slow subscriber doesn't shut down when a signal handler is installed; this is due to thewhile (!remainingQueues.empty())
inChunkDistributor::deliverToAllStoredQueues
which is not stopped whenSIG_TERM
has a custom signal handlerRuntime::unblockShutdown
could be implementedprocessKillDelay
inRouDi::shutdown
methodm_prcMgr->requestShutdownOfAllProcesses();
RouDi has to make all the publisher stop offering so that the discovery loop can remove the subscriber queues from theChunkDistributor
of the publisher