Closed painter1 closed 3 years ago
FWIW, this same condition has occurred recently on our synda instance at NCAR. The last log message was "Renew certificate" and a "synda watch" produced no results, even though a number of files were in wait based on the queue. The certificate request was to the myproxy instance at esgf-node.llnl.gov.
I think that this happens, at least sometimes, when a network error, during a query related to OpenID or certificates, leads to an exception which kills the daemon. Something in the system is left running when you start the daemon again. The other way to kill a daemon is to explicitly stop it, which calls sddaemon.stop(). This cleans up stray processes and deletes any "orphan pidfile".
So I would recommend explicitly starting and then stopping the daemon any time that it crashed.
Relevant to all of this is sdlogon.renew_certificate(). There is also a function sdlogon.renew_certificate_with_retry(). We would have fewer crashes if the *_with_retry() function were called instead of the plain renew_certificate(). Judging by the comments, the reason why it isn't being used is that it is hard for sddaemon.stop() to kill it!
In the future 3.4 release (end 2021), the features based on the daemon will be deprecated. A new feature for asynchronous downloads will be implemented.
A few times recently, Synda has completely stopped doing anything even though the daemon kept running. Nothing appears in the log after a "Renew certificate" message. Thus it seems that automatic certificate renewal is not working in my environment. I do not know whether this is an issue with Synda or the environment in which I run it. My solution is to stop the daemon, run myproxy-logon (I routinely use the -b option), then start the daemon.