ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

Daemon stops on its own after using install #95

Closed meteorologist15 closed 3 years ago

meteorologist15 commented 6 years ago

Sometimes I have used the "synda install ..." feature and it worked fine. At other times, the data requests have gotten stuck waiting in the queue. This is because the daemon stops running 10 seconds after starting the daemon.

synda daemon start synda daemon status Daemon running (10 seconds later) synda daemon status Daemon not running

Please address this issue. My guess is that the daemon is supposed to ALWAYS be running unless I EXPLICITLY tell it to stop (i.e. synda daemon stop). This is not what is happening.

System: RHEL7 Shell: csh synda version: 3.8

which python $ST_HOME/bin/python

painter1 commented 6 years ago

It is possible for an uncaught exception to crash the daemon. I have patched our local copy of Synda 3.9 to prevent the ones I have encountered, but I don't know whether they would help you. Could you look in /var/log/synda/sdt/.log and /tmp/sdtstacktrace.log ?

meteorologist15 commented 6 years ago

This was my stacktrace log:

============= 2 Trace function called from '/local2/home/synda/sdt/lib/sd/sddaemon.py' file in 'start' function at line 116 3 Exception occured at 2018-05-22 17:05:35.859474 4 Traceback (most recent call last): 5 File "/local2/home/synda/sdt/lib/sd/sddaemon.py", line 113, in start 6 main_loop() 7 File "/local2/home/synda/sdt/lib/sd/sddaemon.py", line 65, in main_loop 8 sdtaskscheduler.event_loop() 9 File "/local2/home/synda/sdt/lib/sd/sdtaskscheduler.py", line 193, in event_loop 10 run_soft_tasks() 11 File "/local2/home/synda/sdt/lib/sd/sdtaskscheduler.py", line 133, in run_soft_tasks 12 sdtask.transfers_begin() 13 File "/local2/home/synda/sdt/lib/sd/sdtask.py", line 148, in transfers_begin 14 dmngr.transfers_begin(transfers) 15 File "/local2/home/synda/sdt/lib/sd/sddmgo.py", line 149, in transfersbegin 16 , _, access_token = api_client.goauth.get_access_token(username=globus_username, password=globus_password) 17 File "/local2/home/synda/sdt/lib/python2.7/site-packages/globusonline/transfer/api_client/goauth.py", line 87, in get_access_token 18 raise GOCredentialsError() 19 GOCredentialsError: Wrong username or password

I looked into it more, and it appears that in my sdt.conf file, I had set "globustransfer" equal to true, however, I did not set up a globus username or password. Therefore, when running "synda install", sddmgo.py initiated goauth.py which raised a G0CredentialsError and stopped the daemon from running. Perhaps this issue could be fixed in the next release?

painter1 commented 6 years ago

I haven't seen this one before because I haven't tried to use Globus - only GridFTP and HTTP.

Until this is fixed, you may want to restart the daemon automatically. Of course there is no point to that if all your downloads are Globus. Here's the script which I use to keep the daemon running:

!/bin/bash
#
# This script starts the Synda (sdt) daemon every five minutes, in case it has died.
# cd to ~/synda_daemon and run under nohup if you don't want to see the output.

while :
do
  if pgrep sddaemon > /dev/null 2>&1
    then
      echo `date` ok
    else
      echo `date` starting daemon
      /etc/init.d/sdt start
  fi
  sleep 300
done
meteorologist15 commented 6 years ago

I have indeed written a script that is further automated by a crontab to restart the daemon every 5 minutes as you suggested. I apologize for the confusion, as I have been using GridFTP and HTTP for my transfers, not the standalone "Globus" application (i.e. GridFTP utilizes globus-url-copy, hence the confusion).

The other issue comes from the fact that, when the daemon is restarted, the running download at the time of stoppage is also terminated. I discovered that this is a result of an Assertion Error from sdproduct.py. Full traceback below:

============= 2 Trace function called from '/local2/home/synda/sdt/lib/sd/sddaemon.py' file in 'start' function at line 116 3 Exception occured at 2018-06-11 11:35:27.333802 4 Traceback (most recent call last): 5 File "/local2/home/synda/sdt/lib/sd/sddaemon.py", line 113, in start 6 main_loop() 7 File "/local2/home/synda/sdt/lib/sd/sddaemon.py", line 65, in main_loop 8 sdtaskscheduler.event_loop() 9 File "/local2/home/synda/sdt/lib/sd/sdtaskscheduler.py", line 195, in event_loop 10 run_hard_tasks() 11 File "/local2/home/synda/sdt/lib/sd/sdtaskscheduler.py", line 124, in run_hard_tasks 12 sdtask.transfers_end() 13 File "/local2/home/synda/sdt/lib/sd/sdtask.py", line 72, in transfers_end 14 dmngr.transfers_end() 15 File "/local2/home/synda/sdt/lib/sd/sddmdefault.py", line 236, in transfers_end 16 end_of_transfer(task) 17 File "/local2/home/synda/sdt/lib/sd/sddmdefault.py", line 217, in end_of_transfer 18 sdevent.file_complete_event(tr) # trigger 'file complete' event 19 File "/local2/home/synda/sdt/lib/sd/sdevent.py", line 74, in file_complete_event 20 variable_complete_event(tr.project,tr.model,tr.dataset,tr.variable) # trigger 'variable complete' event 21 File "/local2/home/synda/sdt/lib/sd/sdevent.py", line 97, in variable_complete_event 22 assert '/output/' not in dataset.path 23 AssertionError

As a follow-up, I altered my config file so that the data path would be saved to $ST_HOME/tmp2 (i.e. /local2/home/synda/sdt/tmp2). Inside tmp2 is a 'cmip5' directory. Inside the 'cmip5' directory is a 'output' and 'output1' directory. Is this perhaps causing the issue? Should I comment out this assertion error? Should I write in a try-catch statement?

ericnienhouse commented 5 years ago

I realize this is an old thread. However, if find the /local2/home/synda/sdt/lib/sd/sdevent.py:

"assert '/output/' not in dataset.path"

to be a problem, as there are over 8000 (CCCma) CMIP5 datasets that are of product "output". This causes synda to crash frequently.

I'm curious @meteorologist15 if you found a solution to this.

Zeitsperre commented 4 years ago

Ran into this issue today and couldn't for the life of me figure out why this was happening. Do you suppose that patching out that assertion for the CCCMA use case would be sufficient?

Thanks for continuing to support an essential data management tool!

pjournou-ipsl commented 3 years ago

In the future 3.4 release, the features based on the daemon will be deprecated. A new feature for asynchronous downloads will be implemented. Other information : Only HTTP protocol is now supported by Synda. Globus features have also been deprecated.