biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

phylophlan_write_config_file.py - can only concatenate str (not "tuple") to str #59

Closed cyklee closed 3 years ago

cyklee commented 3 years ago

phylophlan_write_config_file.py version 3.0.19 (3 November 2020) using the following input results in an error:

phylophlan_write_config_file --db_type n --db_dna makeblastdb --map_dna blastn --msa muscle --trim trimal --gene_tree1 raxml --gene_tree2 raxml --tree1 raxml --tree2 raxml --force_nucleotides -o test

Traceback (most recent call last):
  File "phylophlan/bin/phylophlan_write_config_file", line 10, in <module>
    sys.exit(phylophlan_write_config_file())
  File "phylophlan/lib/python3.9/site-packages/phylophlan/phylophlan_write_config_file.py", line 349, in phylophlan_write_config_file
    gene_tree1['params'] += ' -m GTRCAT',
TypeError: can only concatenate str (not "tuple") to str

I believe this is due to trailing comma in ln 349: gene_tree1['params'] += ' -m GTRCAT',

Such issue can also be found on ln 364.

fasnicar commented 3 years ago

Hi, thank you very much for reporting this. It was indeed a mistake from an old commit. It is now fixed with commit f1999e6672f9b4b216a96c15e93f21b136070afe.

I also added a couple of validity checks because I noticed from your command that you are specifying both: --gene_tree1, --gene_tree2, --tree1, --tree2, all to raxml that does not make much sense. If you want a gene tree pipeline, then --tree1 has to be a consensus approach. So, please check the wiki if you're in doubt about what you should specify.

Many thanks, Francesco

cyklee commented 3 years ago

Hi @fasnicar, Thank you for pointing that out and the speedy fix! I'm actually trying to run the supermatrix pipeline using nucleotide input but ran into a few issues:

  1. Can phylophlan_setup_database work for an entire genus?
  2. If not, can I use phylophlan database with --force_nucleotides? I'm not actually quite sure how that switch works in order for me to set up a config file. Could you possibly provide an example file where --force_nucleotides has been properly applied? I think this is the same issue mentioned here.

I'm testing out running: phylophlan_write_config_file --db_type a --db_aa diamond --map_aa diamond --map_dna diamond --msa muscle --trim trimal --tree1 fasttree --tree2 raxml --force_nucleotides -o test5.cfg

Kind regards Kevin

fasnicar commented 3 years ago

Hi Kevin,

So, at the moment phylophlan_setup_database can only download species-specific UniRef90 proteins. So, for a genus-level phylogeny, I would recommend you use the phylophlan database. Alternatively, if you identify a set of proteins conserved at the genus-level, you can provide them to phylophlan_setup_database to build your custom db.

The --force_nucleotides param forces PhyloPhlAn to use the nucleotides after mapping the proteins in the db to the input genomes. You need to specify it also when you generate the configuration file because the parameters for the phylogeny inference tools should reflect it as they will use a nucleotide MSA.

The command for the config file looks fine.

Many thanks, Francesco

cyklee commented 3 years ago

Hi Francesco, Thanks you for your excellent explanations and sorry for the late response. I've successfully ran inferences with both protein and nucleotide data.

With gratitude, Kevin