daemon stopping without reporting errors

francocatalano commented 3 years ago

After updating synda to v3.11 I got into the following problem. I define a selection file to get some datasets from CMIP6, like this: project=CMIP6 source_id=IPSL-CM6A-LR experiment_id=historical member_id=r1i1p1f1 table_id=Amon frequency=mon variable_id=tas psl pr hfls hfss evspsbl rlds rlus rsds rsus prw

The search command seems to work properly: synda search -s selection/my_test_selfile.txt new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.hfss.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rlus.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.pr.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rsus.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.psl.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rlds.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.evspsbl.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.rsds.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.hfls.gr.v20180803 new CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.prw.gr.v20180803

Then I launch install: synda install -s selection/my_test_selfile.txt 11 file(s) will be added to the download queue. Once downloaded, 1.3 GB of additional disk space will be used. Do you want to continue? [Y/n] Y 11 file(s) enqueued You can follow the download using 'synda watch' and 'synda queue' commands The daemon is not running. To start it, use 'synda daemon start'.

But when I launch the daemon it starts and stops immediately without apparently giving any error: synda daemon start Handing over to daemon process, you can check the daemons logs at /sgi_specs/a/.synda/log/transfer.log

synda watch Daemon not running

and the transfer log file reports only the following info: 2020-10-03 12:18:38,583 INFO SDDAEMON-001 Daemon starting ... INFO: Connected to /sgi_specs/a/.synda/db/sdt.db 2020-10-03 12:18:38,584 INFO SDTSCHED-533 Connected to /sgi_specs/a/.synda/db/sdt.db 2020-10-03 12:18:38,584 INFO SDTSCHED-993 Starting watchdog.. 2020-10-03 12:18:38,585 INFO SDFILDAO-200 get_files time is 0.000953, search select * from file where status=:status ORDER BY priority DESC, checksum with {'status': 'running'} Daemon successfully started

Anyone knows what's going on? Thanks a lot for your help.

painter1 commented 3 years ago

Are there any files of the form /tmp/sdt_stacktrace_*.log ? If there's one written at about the right time, it might be revealing.

francocatalano commented 3 years ago

Hi. Yes, this is the content of the corresponding /tmp/sdtstacktrace*.log

Trace function called from '/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py' file in 'start' function at line 118 Exception occured at 2020-10-03 12:18:38.672102 Traceback (most recent call last): File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py", line 115, in start main_loop() File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sddaemon.py", line 66, in main_loop sdtaskscheduler.event_loop() File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdtaskscheduler.py", line 164, in event_loop clear_failed_url() File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdtaskscheduler.py", line 94, in clear_failed_url sdsqlutils.truncate_table("failed_url") File "/fas_c/UTENTI/franco/miniconda/miniconda2/envs/franco-synda-environment/lib/python2.7/site-packages/synda-3.11-py2.7.egg-info/scripts/sdsqlutils.py", line 49, in truncate_table conn.execute("delete from %s"%table) OperationalError: no such table: failed_url

I am quite new to synda, Any suggestions? Thanks a lot.

painter1 commented 3 years ago

failed_url is a table which newly has to be in the database. In bash, type sqlite3 Then in sqlite3 type everything but "sqlite3>" here:

sqlite3>  CREATE TABLE failed_url ( url_id INTEGER PRIMARY KEY, url TEXT, file_id INTEGER );
sqlite3>  CREATE UNIQUE INDEX idx_failed_url_1 ON failed_url (url);
sqlite3> .quit

Now I realize that we need an automated way to update the database thus, or to function without it, or at least issue a warning when it isn't there. I will work on that.

painter1 commented 3 years ago

BTW, the table "failed_url" is needed so that if a data node fails to supply data, Synda can go try another data node.

francocatalano commented 3 years ago

In bash, type sqlite3 Then in sqlite3 type everything but "sqlite3>" here:

sqlite3>  CREATE TABLE failed_url ( url_id INTEGER PRIMARY KEY, url TEXT, file_id INTEGER );
sqlite3>  CREATE UNIQUE INDEX idx_failed_url_1 ON failed_url (url);
sqlite3> .quit

I've just tried that but when I launch the daemon again I still get the same error as before: OperationalError: no such table: failed_url Also tried deactivating and reactivating synda environment after issuing the sqlite3 commands but same problem.

painter1 commented 3 years ago

I'm sorry, I gave the wrong sqlite command. It should be sqlite3 [path to your database]

francocatalano commented 3 years ago

I'm sorry, I gave the wrong sqlite command. It should be sqlite3 [path to your database]

Now it worked.Thanks a lot.

painter1 commented 3 years ago

I’m looking for how this could happen. Was it a brand new database, nothing in the file table?

Jeff

From: Rafael Abreu notifications@github.com Sent: Wednesday, October 21, 2020 8:55 AM To: Prodiguer/synda synda@noreply.github.com Cc: Painter, Jeff painter1@llnl.gov; Mention mention@noreply.github.com Subject: Re: [Prodiguer/synda] daemon stopping without reporting errors (#158)

I am having the same problem. Used the command provided by @painter1https://github.com/painter1 but after that, another problem seems to occur:

============= Trace function called from '/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sddaemon.py' file in 'start' function at line 118 Exception occured at 2020-10-21 09:59:03.228342 Traceback (most recent call last): File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sddaemon.py", line 115, in start main_loop() File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sddaemon.py", line 66, in main_loop sdtaskscheduler.event_loop() File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sdtaskscheduler.py", line 166, in event_loop sdfiledao.highest_waiting_priority( True, True ) #initializes cache of max priorities File "/vol0/rafaleca/miniconda3/envs/synda-env/lib/python2.7/site-packages/synda-3.12-py2.7.egg-info/scripts/sdfiledao.py", line 228, in highest_waiting_priority return (highest_waiting_priority.vals).get(data_nodes[0],None) IndexError: list index out of range

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Prodiguer/synda/issues/158#issuecomment-713677194, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAVLQMJIDYNXX3T4MUKKRNLSL373VANCNFSM4SCSH7PA.

painter1 commented 3 years ago

This error involving sdfiledao.py line 228 (list index out of range) can be bypassed by editing sdconst.py to set GET_FILES_CACHING to False. However, if you have a large database and are simultaneously downloading from several data nodes, you will take a performance hit. I will try to reproduce the problem (I think an empty database will do it), fix it (should be very simple), and submit a pull request soon.

painter1 commented 3 years ago

@francocatalano and Rafael Abreu: the problem involving failed_urls was supposedly fixed about a year and a half ago. There was about a one-month window in which this problem was clearly possible - although I can't certainly exclude the possibility of a bug in that fix. So I have some questions:

Exactly how did you get the Synda version you have. Exactly what date was it downloaded?
What is SYNDA_VERSION in your sdconst.py (very near the end of the file)? What is version in sdapp.py (around the middle of the file, the line above sdapputils.set_exception_handler())?

What is your database version? You can get the version thus:

bash> sqlite3 [path to your synda database]
sqlite> SELECT version FROM version;

Thank you!

rafaelcabreu commented 3 years ago

Thanks for the update @painter1. I deleted my original comment because I was able to get it working by running the daemon after running synda install.

As for the versions, I am using synda version 3.12 installed with conda and sqlite3 version 3.33.0.

painter1 commented 3 years ago

@rafaelcabreu, the database itself has a version number, different from the sqlite3 version number. Would you please check that? Thankis.

ESPRI-Mod / synda

daemon stopping without reporting errors #158