biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
127 stars 33 forks source link

Problem at last step (gene refining) and --maas options #52

Open pedrojcy91 opened 3 years ago

pedrojcy91 commented 3 years ago

Dear developers, I have encountered a problem when running Phylophlan3. The program does not generate the final tree, it actually stops at the final stages (gene refining with raxml). This is the command and parameters used:

phylophlan -i folder-genomes -d phylophlan -o try-out -t a --diversity high--nproc 72 -f /home/egg/miniconda3/envs/phylophlan3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/supertree_aa.cfg --maas /home/egg/miniconda3/envs/phylophlan3/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/phylophlan.tsv

I just want to obtain the consensus tree as it was done in the old versions of phylophlan with the 400 markers (using the option -u, user tree, with my input folder). My input folder is .faa

The initial steps run properly: Inputs already checked Inputs already cleaned Loading files from "picos-try-phylo3-out/tmp/clean_aa" "phylophlan" markers already mapped (key: "map_aa") Markers already selected Markers already extracted Inputs already translated into markers Markers already aligned (key: "msa") Markers already trimmed (key: "trim") Markers already subsampled Gene trees already built Polytomies already resolved Refining 206 gene trees Refining gene tree "picos-try-phylo3-out/tmp/sub/p0351.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0336.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0350.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0133.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0023.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0298.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0218.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0233.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0267.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0257.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0177.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0135.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0000.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0084.aln" Refining gene tree "picos-try-phylo3-out/tmp/sub/p0272.aln"

Then, these are the specific errors:

[e] Command '['/home/egg/miniconda3/envs/phylophlan3/bin/raxmlHPC', '-m', 'PROTCATLG', '-p', '1989', '-t', 'picos-try-phylo3-out/tmp/gene_tree1_polytomies/p0298.tre', '-w', '/home/pedroj/phyloplan/picos-try-phylo3-out/tmp/gene_tree2', '-s', 'picos-try-phylo3-out/tmp/sub/p0298.aln', '-n', 'p0298.tre']' returned non-zero exit status 255.

[e] error while executing command_line: /home/egg/miniconda3/envs/phylophlan3/bin/raxmlHPC -m PROTCATLG -p 1989 -t picos-try-phylo3-out/tmp/gene_tree1_polytomies/p0298.tre -w /home/pedroj/phyloplan/picos-try-phylo3-out/tmp/gene_tree2 -s picos-try-phylo3-out/tmp/sub/p0298.aln -n p0298.tre stdin: None stdout: None env: {'LESSOPEN': '| /usr/bin/lesspipe %s', 'CONDA_PROMPT_MODIFIER': '(phylophlan3) ', 'USER': 'pedroj', 'SSH_CLIENT': '193.147.133.245 49821 22', 'LC_TIME': 'es_ES.UTF-8', 'BLASTDB': '/home/egg/Databases/blastdb/', 'XDG_SESSION_TYPE': 'tty', 'SHLVL': '1', 'MOTD_SHOWN': 'pam', 'HOME': '/home/pedroj', 'CONDA_SHLVL': '1', 'OLDPWD': '/home/pedroj', 'SSH_TTY': '/dev/pts/1', 'LC_MONETARY': 'es_ES.UTF-8', 'DBUS_SESSION_BUS_ADDRESS': 'unix:path=/run/user/1002/bus', '_CE_M': '', 'LIBVIRT_DEFAULTURI': 'qemu:///system', 'LOGNAME': 'pedroj', '': '/home/egg/miniconda3/envs/phylophlan3/bin/phylophlan', 'XDG_SESSION_CLASS': 'user', 'TERM': 'xterm', 'XDG_SESSION_ID': '902', '_CE_CONDA': '', 'SSUALIGNDIR': '/home/egg/Programs/ssu-align/lib', 'PATH': '/home/egg/miniconda3/envs/phylophlan3/bin:/home/egg/miniconda3/condabin:/home/egg/Programs/kofam_scan-1.3.0:/home/egg/Programs/sratoolkit.2.10.9-ubuntu64/bin:/home/egg/Programs/snippy-4.6.0/bin:/home/egg/Programs/get_homologues-3.3.3:/home/egg/Programs/ssu-align/bin:/home/egg/Programs/velvet:/home/egg/Programs/picardtools:/home/egg/Programs/blat:/home/egg/Programs/SPAdes-3.14.1/bin:/home/egg/Programs/simka/build/bin:/home/egg/Programs/sharedtools:/home/egg/Programs/rodney/creep:/home/egg/Programs/metabat2/build/bin:/home/egg/Programs/megahit/build:/home/egg/Programs/idba/bin:/home/egg/Programs/Flye/bin:/home/egg/Programs/cmdtools:/home/egg/Programs/canu-2.1.1/build/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin', 'LC_ADDRESS': 'es_ES.UTF-8', 'XDG_RUNTIME_DIR': '/run/user/1002', 'DISPLAY': 'localhost:10.0', 'LANG': 'en_US.UTF-8', 'LC_TELEPHONE': 'es_ES.UTF-8', 'BIOPERL_INDEX': '/home/egg/Databases/blastdb/', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:*.xspf=00;36:', 'HMMERDB': '/home/egg/Databases/pfam/', 'CONDA_PYTHON_EXE': '/home/egg/miniconda3/bin/python', 'SHELL': '/bin/bash', 'LC_NAME': 'es_ES.UTF-8', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'PFAMDB': '/home/egg/Databases/pfam/', 'CONDA_DEFAULT_ENV': 'phylophlan3', 'LC_MEASUREMENT': 'es_ES.UTF-8', 'LC_IDENTIFICATION': 'es_ES.UTF-8', 'PWD': '/home/pedroj/phyloplan', 'CONDA_EXE': '/home/egg/miniconda3/bin/conda', 'SSH_CONNECTION': '193.147.133.245 49821 10.128.0.15 22', 'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop', 'LC_NUMERIC': 'es_ES.UTF-8', 'CONDA_PREFIX': '/home/egg/miniconda3/envs/phylophlan3', 'CORTADO': '/home/egg/Programs/cmdtools/', 'LC_PAPER': 'es_ES.UTF-8'}

Is there any way to simplify this? Maybe I need to write another config file instead of supertree_aa.cfg? Many thanks for your help, Best regards, Pedro J

fasnicar commented 3 years ago

Hi Pedro,

Thank you for reporting this. The command line you specified define a "gene tree" pipeline and not a concatenation one, like in old PhyloPhlAn versions.

If you want to do that you should use the supermatrix_aa.cfg file. At that point the command line should look like:

phylophlan -i folder-genomes -d phylophlan -o try-out -t a --diversity high--nproc 72 -f supermatrix_aa.cfg 

(the path /home/egg/miniconda3/envs/phylophlan3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/ should be automatically checked by PhyloPhlAn)

In any case, to debug your error, have you tried to run the command reported in the error:

/home/egg/miniconda3/envs/phylophlan3/bin/raxmlHPC -m PROTCATLG -p 1989 -t picos-try-phylo3-out/tmp/gene_tree1_polytomies/p0298.tre -w /home/pedroj/phyloplan/picos-try-phylo3-out/tmp/gene_tree2 -s picos-try-phylo3-out/tmp/sub/p0298.aln -n p0298.tre

Also, I would suggest adding the --verbose as more will be printed in the command line that can be useful to see what's happening, ideally, if you can save the output to a file and attach here I can use it for debugging.

You can find more info about the phylogenetic pipelines and parameters in the wiki and some examples in the tutorials, your case seems to fit into "Example 01: S. aureus" and "Example 04: E. coli".

Many thanks, Francesco

pedrojcy91 commented 3 years ago

Hi Francesco, Many thanks for such quick response. It makes sense now. Indeed, the concatenated tree was generated using the config file supermatrix_aa.cfg

Just one more simple question. To specifically count how many markers were used to make the concatenated tree, I have counted them from the output folder markers/ (where all p0 markers are specified). Is this correct? To assess how many markers each genome had I assume I need to check that in the folder markers_aa, is that right?

Finally, I used the .tre output file as my final tree, but there are other generated files such as raxml_bestTree, raxml_info, raxml_result or the resolved.tre. Should I consider some of these as a general phylogenomics approach as well?

Many thanks for your help and best regards, Pedro J

fasnicar commented 3 years ago

Hi Pedro,

Great! To count the actual number of markers, it depends on the pipeline and the cleaning steps performed on the MSAs. What you should count is not the number of markers inside the markers folder, but you should count the content of the last folder created inside the tmp (you can sort with ls -t to find the last created folder). For the number of markers for each genomes, yes you should count either markers_aa or markers_dna (if you have markers_aa that's the correct one).

The .tre output is the one generated by FastTree as is the first tree built by PhyloPhlAn (I'm assuming your config file contains both sections [tree1] and [tree2], with FastTree specified in [tree1] and RAxML in [tree2]). So the final phylogeny you should consider is the RAxML_bestTree_....

Here in the wiki the outputs are described in more details: wiki / Output.

Many thanks, Francesco

pedrojcy91 commented 3 years ago

Hi Francesco, Many thanks for all your help and info. We will proceed as you suggest and I will ask you any further queries. All the best, Pedro J