Closed StevenCTimm closed 1 year ago
Some economy of time may be saved by combining the function of several publishers into one. Seven of the nine publishers in the current configuration are just writing to Graphite and that underlying publisher, I believe, has the capacity to publish multiple data blocks within the same call, if configured properly. If it doesn't it ought to be able to be modified to do so.
It should also be noted that although the nominal reason for trying to run all the publishers at shutdown time is to de-advertise the decision engine classads from the factory, that functionality is not working either at the moment. (this was a very hard problem that was at one point solved). The classads persist even though the long shutdown is happening.
@StevenCTimm, after doing some digging, I think part of the problem is that the behavior of --stop-channel
was changed in DE 1.6.0 to always perform a clean shutdown of the channel. If the desire is to just kill the channel after so many seconds, then --kill-channel
should be used instead. From de-client -h
:
$ de-config -h
...
Channel-specific options:
--start-channels start all channels
--stop-channels stop all channels
--start-channel <channel name>
--stop-channel <channel name>
Attempt clean shutdown of channel.
--kill-channel <channel name>
Same as --stop-channel, except the channel process
will be killed once the server's configured shutdown
timeout window is exceeded
-f, --force May be used with --kill-channel to immediately kill
the channel process
--timeout <seconds> May be specified with --kill-channel to override the
DE server's configured timeout window or max time to
wait for --block-while.
...
This doesn't address all of the issues that have been reported here, but it should explain the behavior for some of them.
Ah...the reason systemctl stop decisionengine
is taking so long is because the default timeout of 10 seconds is being applied to each source in addition to the channel. We'll have to fix that.
PR #636 should address the issue of systemctl stop decisionengine
taking 2-3 minutes.
The initial problem, namely that de-client --stop-channel ; de-client --start-channel, didn't work, is still the case. systemctl stop decisionengine still also takes far too long (about 2 minutes)
@StevenCTimm, did you see my comment above re. --stop-channel
vs. --kill-channel
? Also, --start-channel
will block until the channel is STEADY. What is the desired behavior?
Surprised to see that systemctl stop decisionengine
is still taking 2 minutes. That should have been addressed with #636, but may have to check that.
this is now addressed in PR 648 as soon as that is merged and in a release we should be good.
de-client --stop-channel resource_request took more than an hour and still didn't shut down the channel. This may simply be due to the fact that we are trying to shut down ALL the publishers, there was no data for any of the publishers to have, and they are all configured now to retry a large number of times.
DE 1.7 and earlier had a configurable timeout after which the shutdown of the channel would give up and just kill it. DE 2.0 has it too but it seems to not be working.
Note also that systemctl stop decisionengine also takes much too long, 2-3 minutes on average.