Closed sajo-ebi closed 11 hours ago
to discuss if the test can be run during this data release or before
How test is conducted with fake data:
In the conda env called goci-1383
, install the new gwas-utils package locally (git clone and pip install).
Then run the following command and inspect the logs.
(goci-1383) [spotbot@codon-dm-01 goci-1383]$ indexer-manager --logFolder /hps/nobackup/parkinso/spot/gwas/scratch/goci-1383
Running nextflow: false
First attempt failed with error: Command '['false']' returned non-zero exit status 1.. Retrying with -resume option.
Wait for the next data release to know if this works
This was included in the last data release but as far as I can see there was no failed step at Solr export. So I doubt this was used.
@karatugo this was executed in last data release can you check if it worked correctly
According to logs, I can confirm that -resume
worked but it's not correct.
$ cat /hps/nobackup/parkinso/spot/gwas/logs/solr_indexing/nextflow.log
Sep-09 14:01:26.911 [main] DEBUG nextflow.cli.Launcher - Setting http proxy: ProxyConfig[protocol=http; host=www-proxy.ebi.ac.uk; port=3128]
Sep-09 14:01:27.339 [main] DEBUG nextflow.cli.Launcher - Setting https proxy: ProxyConfig[protocol=https; host=www-proxy.ebi.ac.uk; port=3128]
Sep-09 14:01:27.339 [main] DEBUG nextflow.cli.Launcher - $> nextflow -log /hps/nobackup/parkinso/spot/gwas/logs/solr_indexing/nextflow.log run /hps/software/users/parkinso/spot/gwas/anaconda3/envs/gwas-utils/nf/solr_indexing.nf --job_map_file /hps/nobackup/parkinso/spot/gwas/logs/solr_indexing/job_map.csv -resume
Sep-09 14:01:27.616 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 21.10.6
Unfortunately this caused another error in Nextflow.
Sep-09 14:01:29.890 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 2; maxThreads: 1000
Sep-09 14:01:29.955 [main] ERROR nextflow.cli.Launcher - Unable to acquire lock on session with ID 23e44f5f-0ac7-43d8-ba29-5c4db15a6b6a
Common reasons of this error are:
- You are trying to resume the execution of an already running pipeline
- A previous execution was abruptly interrupted leaving the session open
the run where the indexing job failed despite resume kicking-in is caused by the indexer itself as it was expecting a new field in solr which was absent at the time of the run, so it's expected to fail. in the last run, there were failures and upon checking the logs i saw that the resume ran and after a while the indexer job finished, so from what i can see it's working
Discussed with Ala about repeated resuming. We agreed to make the following changes. It seems not urgent, so only creating its ticket for now.
Currently as part of Daat release whenever the Solr indexing fails in between , we have to manually trigger nextflow resume command , this is not very efficient we lose lot of productivity especially when DR fails in non working hours or weekends . We need to handle the resume command in Python caller of the nextflow . The nexflow is calling using the below method
indexer-manager --newInstance spotrel --oldInstance spotpub --solrHost http://gwas-garfield --solrCore gwas --solrPort 8983 --wrapperScript /hps/software/users/parkinso/spot/gwas/prod/sw/solr-indexer/new_solr_wrapper.sh --logFolder /hps/nobackup/parkinso/spot/gwas/logs/solr_indexing --fullIndex