flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

init: failed #14

Closed kuzman1306 closed 5 years ago

kuzman1306 commented 5 years ago

Dear Florent,

I have tried to initiate Pantagruel pipeline, but getting the following error:

pantagruel -d agrobacterium -r /home/kuzman/ -f PANTAGFAM -I kuzmanovic1306@gmail.com  -a /home/kuzman/agrobacterium_genomes init
This is Pantagruel pipeline version 534d87591763972e01d70fe54dda5fb5b4a4445c using source code from repository '/home/kuzman/pantagruel_pipeline/pantagruel'
set gene family prefix to 'PANTAGFAM'
set identity to 'kuzmanovic1306@gmail.com'
set custom (raw) genome assembly source folder to '/home/kuzman/agrobacterium_genomes'
# will run tasks: init
[2019-07-31 22:27:09] Pantagrel pipeline task init: initiate pangenome database.
Default: set runmode to 'normal'
Default: set NCBI Taxonomy source folder to '/home/kuzman/NCBI/Taxonomy_2019-07-31'
Default: set gene tree type to 'fullgenetree'
Default: will use a strict core-genome gene set, i.e. genes present in a single copy in all the studied genomes.

!!! WARNING: strict core-genome definition can be very resctrictive, especially when including draft genome in the study.
You might prefer to use a pseudo-core genome definition instead, i.e. selecting gene present in a minimum fraction of genomes, for instance 98%.
A sensible threshold should avoid that selected genes have an approximately homogeneous distribution,
notably that the absent fraction is not restricted to a few genomes. This threshold will thus depend on the dataset.
To choose a sensible value, AFTER TASK 03, you can run the INTERACTIVE script:
'/home/kuzman/pantagruel_pipeline/pantagruel/scripts/choose_min_genome_occurrence_pseudocore_genes.sh'
and then manualy edit the value of variable 'pseudocoremingenomes' in the pantagruel configuration file.
Default (only relevant to 'core' task): core sequence type is set to 'cds'
Default (only relevant to 'core' task): set population delination branch support threshold to default
Default (only relevant to 'core' task): set population delination branch support threshold multiplier for leaf populations to 1.5
Default (only relevant to 'core' task): set population delination branch support threshold to 80
Default (only relevant to 'core' task): set reference tree rooting method to treebalance
Default: all computations will be run locally
export cladesupp=70
export subcladesupp=35
export criterion=bs
export withinfun=median
created init file at '/home/kuzman/agrobacterium/environ_pantagruel_agrobacterium.sh'
Traceback (most recent call last):
  File "<stdin>", line 10, in <module>
TypeError: not all arguments converted during string formatting
ERROR: Custom strain info file detected; Error in format
ERROR: Pantagrel pipeline task init: failed.

I use ubuntu-18.10 mashine. I attached input files.

I would appreciate if you could help me to solve it.

I look forward to hearing from you.

Cheers,

Nemanja

agrobacterium_genomes.zip

flass commented 5 years ago

Hi Nemanja,

happy to see some agrobacteria around here!

there was a bug in the processing of the error that hindered the proper error message to come out (fixed in b20d316); thanks for pointing it out! Now the error message with your data would \be:

ValueError: Error in format of strain info file 'strain_infos_vitis3.txt':
the fields ['locus_tag_prefix'] are missing from the header.
Extra fields were detected: ['locus_tag_prefix\r']. Maybe the header was misspelled? 
Please edit the file accordingly.

This seem to be because the file has been edited under a Windows OS and the end-of-line character is thus '\r\n' instead of the '\n' expected on Linux systems. I would recommend you search all '\r' characters and replace them with an empty string, in that file and any other input file, including contigs. Under unix systems you can use this sort of command to clean your files:

sed -e 's/\r//g' windowsfile > linuxfile

I hope this helps and that not too many errors will come up after!

Cheers Florent

kuzman1306 commented 5 years ago

Hi Florent, It solved the problem. Thank you for your effort. Cheers, Nemanja

flass commented 5 years ago

No problem! thank you for reporting - and don't hesitate to do again! Cheers, Florent