ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

synda daemon and synda reset not responsive #195

Closed xlevine closed 2 years ago

xlevine commented 2 years ago

Hello, I am in the process of downloading a large amount of data, but the download has stopped and I am unable to change synda daemon.

Here is the summary of the data transfer currently happening:

> synda queue
status count size done 6400 1.0 TB error 911 32.7 GB running 5 299.3 MB waiting 90584 3.0 TB

The problem is that there has not been any new data downloaded for nearly week, although daemon is currently active:

> synda daemon status Daemon running

I was thinking about restarting the install. But when I try to stop daemon, it does not work:

_> synda daemon stop Traceback (most recent call last): File "/cluster/projects/nn8002k/conda/synda-env/bin/synda", line 33, in sys.exit(load_entry_point('synda==3.35', 'console_scripts', 'synda')()) File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/main.py", line 196, in run status = sdtiaction.actionsargs.subcommand File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sdtiaction.py", line 180, in daemon sddaemon.stop() File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddaemon.py", line 141, in stop if psutil.pid_exists(pid): File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/psutil/init.py", line 1375, in pidexists if pid < 0: TypeError: '<' not supported between instances of 'NoneType' and 'int'

Would you have any idea about what is happening?

I am also unable to do a synda reset:

synda reset Error occured at 2021-12-03 15:10:56.402808 Traceback (most recent call last): File "/cluster/projects/nn8002k/conda/synda-env/bin/synda", line 33, in sys.exit(load_entry_point('synda==3.35', 'console_scripts', 'synda')()) File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/main.py", line 196, in run status = sdtiaction.actionsargs.subcommand File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sdtiaction.py", line 451, in reset sddeletefile.reset() File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddeletefile.py", line 118, in reset nbr=sddeletequery.purge_error_and_waiting_transfer() File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddeletequery.py", line 50, in purge_error_and_waiting_transfer c.execute( sqlite3.OperationalError: disk I/O error

Thank you very much for any help you can provide.

Best, Xavier.

pjournou-ipsl commented 2 years ago

Can you see in your synda environment, in the $ST_HOME/tmp directory, if a daemon.pid file exists ? According to your description, it should exist. If not, at this stage, I would not really understand....

If it exists, please execute the following instructions :

Step 1 / Execute the "more daemon.pid" instruction to retrieve the pid process value (ex : 12345). You can then execute the following command, ex : "ps -aux | grep 12345" to ensure that it is running (or not) on your system.

Step 2 / If the pid is not running, you can safety remove the daemon.pid file from your system (rm daemon.pid)

Step 3 / See that now, the "synda deamon status" displays : Daemon not running. Step 4 / Restart your downloads with a new daemon : "synda daemon start"

If these actions don't solve your problem, please don't hesitate to contact us again.

xlevine commented 2 years ago

Hi @pjournou-ipsl , Many thanks for your prompt response, and my apologies for my delayed reply due to computer issues at my center.

After some further investigation, I realized I was running out of space in my install directory, leading to I/O disk error with deamon.

This means I now have to do the install in a different directly.... To start from a clean slate, I decided to delete and reinstall .synda in a new directory. Despite, I now have the following error:

_> synda reset

Error occured at 2021-12-10 12:15:16.678107

Traceback (most recent call last): File "/cluster/projects/nn8002k/conda/synda-env/bin/synda", line 33, in sys.exit(load_entry_point('synda==3.35', 'console_scripts', 'synda')()) File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/main.py", line 196, in run status = sdtiaction.actionsargs.subcommand File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sdtiaction.py", line 450, in reset from synda.sdt import sddeletefile File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddeletefile.py", line 18, in from synda.sdt import sddao File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddao.py", line 17, in from synda.sdt import sddb File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddb.py", line 94, in connect() File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddb.py", line 50, in connect sddbobj.create_tables(conn) File "/cluster/projects/nn8002k/conda/synda-env/lib/python3.8/site-packages/synda/sdt/sddbobj.py", line 48, in create_tables conn.execute("create table if not exists file (file_id INTEGER PRIMARY KEY, url TEXT, file_functional_id TEXT, filename TEXT, local_path TEXT, data_node TEXT, checksum TEXT, checksum_type TEXT, duration INT, size INT, rate INT, start_date TEXT, end_date TEXT, crea_date TEXT, status TEXT, error_msg TEXT, sdget_status TEXT, sdget_error_msg TEXT, priority INT, tracking_id TEXT, model TEXT, project TEXT, variable TEXT, last_access_date TEXT, dataset_id INT, insertion_groupid INT, timestamp TEXT)") sqlite3.OperationalError: database is locked

Using any other synda commands give me the same " database is locked" error. Do you have any idea how to resolve this error? Any help would be much appreciated, as i am completed stuck otherwise. Best, Xavier.

pjournou-ipsl commented 2 years ago

It seems that your local environment is broken, I can suggest you to execute the following commands to create a new one : 1 / $ synda init-env (you have to set your credentials one more time, sorry...) 2 / $ synda check-env [Would you like to set your openID credentials? y/n: n] At the end, this message must appear : Check complete.

It is the better way to be sure that your local synda environment is ready to be used.

Then, the sub-command "synda reset" is used to : Remove all 'waiting' and 'error' transfers from the DB. But I have just tested it with an empty DB and the result is with no effect, so, I don't really understand your DB lock...

xlevine commented 2 years ago

Thanks for your quick response! Unfortunately, I have tried what you just suggested and it did not work. Afterward I reinstalled both synda-env and synda itself, but the sqlite3.OperationalError: database is locked keeps appearing. I checked on the web and I have not found any clear answer to this problem. So I am completely stuck right now. If you have any other suggestion, I would much appreciate it. My apologies for bothering you with this issue! Best, Xavier

pjournou-ipsl commented 2 years ago

Are you sure that synda points on your new empty DB ? See the path in your ST_HOME environment variable. You can also remove all the files (*.db) in your ST_HOME/db directory and make an other test (Synda will create a new db automatically). Sorry for my trivial suggestions. I really try to understand the root of your problem.

pjournou-ipsl commented 2 years ago

I note that you are still using the same conda environment "synda-env". An other idea consists on creating a new conda environment to see if synda works fine or not in this new one. It does not take a long time and it could help to evaluate this other possible cause of your problem.

xlevine commented 2 years ago

Hi @pjournou-ipsl Thanks again for your help. I have tried everything you described: setting up a new environment ("synda-environment"), deleting files in the db directory and re-initializing synda, but still the same problem arise. See below the error message I keep getting every time I do a synda command:

synda queue

Error occured at 2021-12-11 08:58:49.747170

Traceback (most recent call last): File "/cluster/home/xale/.conda/envs/synda-environment/bin/synda", line 33, in sys.exit(load_entry_point('synda==3.35', 'console_scripts', 'synda')()) File "/cluster/home/xale/.conda/envs/synda-environment/lib/python3.8/site-packages/synda/sdt/main.py", line 196, in run status = sdtiaction.actionsargs.subcommand File "/cluster/home/xale/.conda/envs/synda-environment/lib/python3.8/site-packages/synda/sdt/sdtiaction.py", line 526, in queue from synda.sdt import sdfilequery File "/cluster/home/xale/.conda/envs/synda-environment/lib/python3.8/site-packages/synda/sdt/sdfilequery.py", line 21, in from synda.sdt import sddb File "/cluster/home/xale/.conda/envs/synda-environment/lib/python3.8/site-packages/synda/sdt/sddb.py", line 94, in connect() File "/cluster/home/xale/.conda/envs/synda-environment/lib/python3.8/site-packages/synda/sdt/sddb.py", line 50, in connect sddbobj.create_tables(conn) File "/cluster/home/xale/.conda/envs/synda-environment/lib/python3.8/site-packages/synda/sdt/sddbobj.py", line 48, in create_tables conn.execute("create table if not exists file (file_id INTEGER PRIMARY KEY, url TEXT, file_functional_id TEXT, filename TEXT, local_path TEXT, data_node TEXT, checksum TEXT, checksum_type TEXT, duration INT, size INT, rate INT, start_date TEXT, end_date TEXT, crea_date TEXT, status TEXT, error_msg TEXT, sdget_status TEXT, sdget_error_msg TEXT, priority INT, tracking_id TEXT, model TEXT, project TEXT, variable TEXT, last_access_date TEXT, dataset_id INT, insertion_group_id INT, timestamp TEXT)") sqlite3.OperationalError: database is locked

I am really puzzled because I have erased the previous synda data folder, created a new synda environmnent, re-installed synda, and despite all of this the data lock error keeps appearing whatever I do. And unfortunately "sqlite3.OperationalError: database is locked" is a very cryptic error message, with not much help I can find online.... I am open to any other suggestion from you, many thanks for your help!

3r1d commented 2 years ago

Hello,

Here are two procedures I used with some success to fix the "sqlite3.OperationalError: database is locked"

procedure-1 check if a process is using the database $ fuser "dbfile" if exists, stop or kill the running process

procedure-2 check if sqlite *db-journal file exists.

if so, open the database with sqlite3 command $ sqlite3 "dbfile"

then do a simple query (e.g. select count(1) from transfert where status='running';)

then exit sqlite3 command.

check if the journal file has been deleted. if so, try running synda again.

Regards

pjournou-ipsl commented 2 years ago

@xlevine I think I have just done the same re-installation than you, on an other system, and I can't reproduce the "database is locked" problem when I execute, as first action the synda queue subcommand. From my point of view, the last point I have to ensure with you is the following : given that the environment that synda uses is given by the path set in the ST_HOME linux variable, can you ensure me that the path you see when you display the content of this ST_HOME variable is the same that the new synda environment you want to use ? Perhaps there is a script (.bashrc, ..) that can set (overwrite) the content of this variable without your knowledge.

xlevine commented 2 years ago

Hi @pjournou-ipsl and @3r1d : Many thanks for your help. I finally made it work, but I am not sure exactly how.... I set up my conda environment and synda once more, but I think what made the difference was to also modify my bashrc file: while I wanted to use Anaconda3 to initialise conda, my bashrc was actually pointing to Miniconda3 instead. After modifying the bashrc script to initialise my conda environment with Anaconda3, the DB block disappeared. Maybe that was the source of the error? Anyway, it works fine now, and I am very grateful for your help! Best, Xavier.

pjournou-ipsl commented 2 years ago

Excellent news ! It is a little frustrating not to understand the deep cause of the problem. The proof seems to be made that the blocked DB state was dependent on the python virtual environment used (probably on the sqlite3 module itself, but I cannot prove it ...) It's interesting. Please note that we are thinking about a new database engine...