flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

No internet connection and speclist #43

Closed mattbawn closed 3 years ago

mattbawn commented 3 years ago

Hi Florent,

I have re-run the pipeline and also used it to annotate my genomes. It fais at step 03 with the error:

--2020-09-18 17:18:07--  (try:20)  http://www.uniprot.org/docs/speclist
Connecting to www.uniprot.org (www.uniprot.org)|128.175.245.202|:80... failed: Connection timed out.
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:80... failed: Connection timed out.
Giving up.

Traceback (most recent call last):
  File "/pantagruel/scripts/pantagruel_sqlitedb_genome_populate.py", line 355, in <module>
    raise ValueError, "specified input file '%s' cannot be found"%nf
ValueError: specified input file '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/New_Install/database/03.database/speclist' cannot be found
ERROR: failed populating database with information on genome assemblies, CDS/protein annotation and gene families
ERROR: something went wrong while initiating and populating database (genome-related tables)
ERROR: Pantagruel pipeline task 3: failed.

This is obviously because my cluster is not connected to the internet. So following #22 I downloaded http://www.uniprot.org/docs/speclist.txt and saved it to 03.database. I then tried to restart the pipeline using:


pantagruel -i database/environ_pantagruel_database.sh --resume 

the stdout read:

This is Pantagruel pipeline version ee0de31c0f56ef12bc42a2e9b7f009899659dbbe using source code from repository '/pantagruel'

will try and resume computation of task where it was last stopped
# will run tasks: 

but the job died with no sterr.

What do you suggest?

Thanks and all the best,

Matt

flass commented 3 years ago

Hi Matt,

thanks for reporting this. this is normal behaviour, as the resume mode (using option -R or --resume) applies to a task, not the whole pipeline - Sorry it's not clear in the doc. You still have to invoke the task to be resumed, and possibly the other tasks you want to. run downstream.

So in your case, you should just run the following:

pantagruel -i database/environ_pantagruel_database.sh --resume 03

or even this, if you want to just let it run through the following tasks:

pantagruel -i database/environ_pantagruel_database.sh --resume 03 04 05 06 07 08

Just remember that:

I hope this goes well from here!

Cheers, Florent