flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

Error: init failed #23

Closed mattbawn closed 4 years ago

mattbawn commented 4 years ago

I have recently installed pantagruel: This is Pantagruel pipeline version 82ae5db4dea2c59df6019ed7f16babf442180cbf on our cluster and am trying to run it on 134 custom assmeblies. I believe I have created the correct file structure and dat file:

head genomes/strain_infos_database.txt 
assembly_id genus   species strain  taxid   locus_tag_prefix
ragout_1    Salmonella  enterica    ERR024387   90371   1
ragout_2    Salmonella  enterica    ERR024388   90371   2
ragout_3    Salmonella  enterica    ERR024389   90371   3
ragout_4    Salmonella  enterica    ERR024391   90371   4
ragout_5    Salmonella  enterica    ERR024392   90371   5
ragout_6    Salmonella  enterica    ERR024394   90371   6
ragout_7    Salmonella  enterica    ERR024395   90371   7
ragout_8    Salmonella  enterica    ERR024396   90371   8
ragout_9    Salmonella  enterica    ERR024397   90371   9

However, I get the following erroro when i try and run init:


pantagruel -d database -r . -f PANTAGFAM -a genomes/ init
activate does not accept more than one argument:
['pantagruel', '-d', 'database', '-r', '.', '-f', 'PANTAGFAM', '-a', 'genomes/', 'init']

This is Pantagruel pipeline version 82ae5db4dea2c59df6019ed7f16babf442180cbf using source code from repository '/opt/software/pantagruel'
set gene family prefix to 'PANTAGFAM'
set custom (raw) genome assembly source folder to '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/genomes'
# will run tasks: init
Default: set runmode to 'normal'
Default: set identity to 'undisclosed'
Default: set NCBI Taxonomy source folder to '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/NCBI/Taxonomy_2019-10-16'
Default: set gene tree type to 'fullgenetree'
Default: set gene tree/species tree reconciliation method to 'ALE'
Default: will use a strict core-genome gene set, i.e. genes present in a single copy in all the studied genomes.

!!! WARNING: strict core-genome definition can be very resctrictive, especially when including draft genome in the study.
You might prefer to use a pseudo-core genome definition instead, i.e. selecting gene present in a minimum fraction of genomes, for instance 98%.
A sensible threshold should avoid that selected genes have an approximately homogeneous distribution,
notably that the absent fraction is not restricted to a few genomes. This threshold will thus depend on the dataset.
To choose a sensible value, AFTER TASK 03, you can run the INTERACTIVE script:
'/opt/software/pantagruel/scripts/choose_min_genome_occurrence_pseudocore_genes.sh'
and then manualy edit the value of variable 'pseudocoremingenomes' in the pantagruel configuration file.

Default (only relevant to 'core' task): core sequence type is set to 'cds'
Default (only relevant to 'core' task): set population delination branch support threshold to default
Default (only relevant to 'core' task): set population delination branch support threshold multiplier for leaf populations to 1.5
Default (only relevant to 'core' task): set population delination branch support threshold to 80
Default (only relevant to 'core' task): set reference tree rooting method to treebalance
Default: all computations will be run locally
[2019-10-16 10:24:07] Pantagrel pipeline task init: initiate pangenome database.
Create new task folder '/nbi/Research-Groups/IFR/Rob-Kingsley/R134_Pantagruel/database'
  File "<stdin>", line 7
    print '%s=%s'%(par, val)
                ^
SyntaxError: invalid syntax
  File "<stdin>", line 7
    print '%s=%s'%(par, val)
                ^
SyntaxError: invalid syntax
  File "<stdin>", line 7
    print '%s=%s'%(par, val)
                ^
SyntaxError: invalid syntax
  File "<stdin>", line 7
    print '%s=%s'%(par, val)
                ^
SyntaxError: invalid syntax
  File "<stdin>", line 12
    raise ValueError, "%s\nthe fields %s are missing from the header.%s%s"%(errprefix, repr(list(missingfields)), sextra, errsuffix)
                    ^
SyntaxError: invalid syntax
ERROR: Custom strain info file detected; Error in format
ERROR: Pantagrel pipeline task init: failed.

Any help would be appreciated.

Thanks

flass commented 4 years ago

Hi Mat, thanks for reporting. these errors seem like something that Python 3 would say if fed Python 2 code. It is likely to happen if your default python is 3, which really it should as it's the modern thing. because I'm old-fashioned I used Python 2 extensively in Pantagruel and referred to it ambiguously as just python. I will try and disambiguate which interpreter to use in the whole pipeline so that it is avoided. I'll notify you of the fix. in waiting you can try and declare your default python as being Python 2 before running the Pantagruel command; something like alias python=python2 should probably work.

flass commented 4 years ago

commits 76858aa and 18f8df2 should have sorted this.

flass commented 4 years ago

Also I just noticed that your initial call returns an error that does not come from pantagruel:

activate does not accept more than one argument:
['pantagruel', '-d', 'database', '-r', '.', '-f', 'PANTAGFAM', '-a', 'genomes/', 'init']

This seems to be coming from some environment wrapper (maybe conda?). Given the followinv logs, it seems the arguments are passed on correctly to pantagruel so maybe nothing to worry about, but just in case I would give a look to the resulting configuration file environ_pantagruel_database.sh to see if the variables are set correctly where you expect them to be.