biobakery / MetaPhlAn

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data
http://segatalab.cibio.unitn.it/tools/metaphlan/index.html
MIT License
301 stars 86 forks source link

metaphlan3 phylogenetic tree #92

Closed fconstancias closed 3 years ago

fconstancias commented 4 years ago

Thanks a lot for releasing metaphlan3. Is there any availble metaphlan3 phylogenetic tree? any easy way to compute one?

Thanks a lot.

fbeghini commented 4 years ago

Hi @fconstancias , we have not provided a phylogenetic tree with all the species included in MetaPhlAn 3. For this purpose, you could use the new PhyloPhlAn 3 for building the tree using --diversity high. The genome accessions can be extracted from the mpa_v30_CHOCOPhlAn_201901.pkl file. If you have issues in retrieving the genomes from NCBI, let me know, I can build a tarball and share it with you.

fconstancias commented 4 years ago

Hi @fbeghini, thanks for your input. If you have time to build a tarball that would be great.

fbeghini commented 4 years ago

Here's the tarball https://drive.google.com/file/d/1qKQijzybSWHCHdepoLAtgkKIBihmbm7c/view?usp=sharing

Fedorov113 commented 4 years ago

Excuse me, but why don't you provide phylogenetic tree? This is essential for unifrac method, for example.

fconstancias commented 4 years ago

I actually got the following error trying to generate a phylogenetic tree from metaphlan v3 reference genomes. Any idea what I am doing wrong?

phylophlan --version
PhyloPhlAn version 3.0.51 (11 May 2020)

(phylophlan) bt141-143:tree fconstan$ phylophlan -i test_some_genomes --diversity high --fast --nproc 2 -d phylophlan -f supertree_aa.cfg --output_folder test

Loading files from "/Users/fconstan/Projects/Oral/metaphlan3/tree/test_some_genomes" Mapping "phylophlan" on 21 inputs (key: "map_dna") Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554635.2_ASM255463v2_genomic.fna" Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna"

[e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>.

[e] cannot execute command command_line: /Users/fconstan/miniconda3/envs/phylophlan/bin/diamond blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna --db phylophlan_databases/phylophlan/phylophlan.dmnd --out test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp stdin: None stdout: None env: {'TERM_PROGRAM': 'Apple_Terminal', 'TERM': 'xterm-256color', 'SHELL': '/bin/bash', 'TMPDIR': '/var/folders/cy/3lgpr0mx1vlfpldzfs6xfckc0000gq/T/', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.nmUtCduo7V/Render', 'CONDA_SHLVL': '1', 'TERM_PROGRAM_VERSION': '421.2', 'CONDA_PROMPT_MODIFIER': '(phylophlan) ', 'TERM_SESSION_ID': '40922023-4CFA-4519-8AF9-FB929E93D7C1', 'USER': 'fconstan', 'CONDA_EXE': '/Users/fconstan/miniconda3/bin/conda', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.aGWyTzLd6O/Listeners', '_CECONDA': '', 'PATH': '/Users/fconstan/miniconda3/envs/phylophlan/bin:/Users/fconstan/.jenv/shims:/Users/fconstan/.jenv/bin:/Users/fconstan/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/munki:/opt/X11/bin', '': '/Users/fconstan/miniconda3/envs/phylophlan/bin/phylophlan', 'CONDA_PREFIX': '/Users/fconstan/miniconda3/envs/phylophlan', 'PWD': '/Users/fconstan/Projects/Oral/metaphlan3/tree', 'JENV_LOADED': '1', 'XPC_FLAGS': '0x0', 'XPC_SERVICE_NAME': '0', '_CE_M': '', 'HOME': '/Users/fconstan', 'SHLVL': '1', 'LOGNAME': 'fconstan', 'CONDA_PYTHON_EXE': '/Users/fconstan/miniconda3/bin/python', 'JENV_SHELL': 'bash', 'LC_CTYPE': 'UTF-8', 'CONDA_DEFAULT_ENV': 'phylophlan', 'DISPLAY': '/private/tmp/com.apple.launchd.Gxdot2MuOz/org.macosforge.xquartz:0', '__CF_USER_TEXT_ENCODING': '0x1F7:0x0:0x2'}

[e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>.

[e] error while mapping {'program_name': '/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'params': 'blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0', 'input': '--query', 'database': '--db', 'output': '--out', 'version': 'version', 'command_line': '#program_name# #params# #input# #database# #output#'} test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna phylophlan_databases/phylophlan/phylophlan.dmnd test/test_some_genomes_phylophlan/tmp/map_dna GCA_002554315.1_ASM255431v1_genomic.b6o.bkp 1 False

[e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>.

[e] gene_markers_identification crashed

fbeghini commented 4 years ago

Excuse me, but why don't you provide phylogenetic tree? This is essential for unifrac method, for example.

@Fedorov113 We do not provide it because we have not built one yet, the previous tree built after the MetaPhlAn2 reference genomes was computed for a different project

fbeghini commented 4 years ago

I actually got the following error trying to generate a phylogenetic tree from metaphlan v3 reference genomes. Any idea what I am doing wrong?

phylophlan --version
PhyloPhlAn version 3.0.51 (11 May 2020)

(phylophlan) bt141-143:tree fconstan$ phylophlan -i test_some_genomes --diversity high --fast --nproc 2 -d phylophlan -f supertree_aa.cfg --output_folder test

Loading files from "/Users/fconstan/Projects/Oral/metaphlan3/tree/test_some_genomes" Mapping "phylophlan" on 21 inputs (key: "map_dna") Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554635.2_ASM255463v2_genomic.fna" Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna" [e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>. [e] cannot execute command command_line: /Users/fconstan/miniconda3/envs/phylophlan/bin/diamond blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna --db phylophlan_databases/phylophlan/phylophlan.dmnd --out test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp stdin: None stdout: None env: {'TERM_PROGRAM': 'Apple_Terminal', 'TERM': 'xterm-256color', 'SHELL': '/bin/bash', 'TMPDIR': '/var/folders/cy/3lgpr0mx1vlfpldzfs6xfckc0000gq/T/', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.nmUtCduo7V/Render', 'CONDA_SHLVL': '1', 'TERM_PROGRAM_VERSION': '421.2', 'CONDA_PROMPT_MODIFIER': '(phylophlan) ', 'TERM_SESSION_ID': '40922023-4CFA-4519-8AF9-FB929E93D7C1', 'USER': 'fconstan', 'CONDA_EXE': '/Users/fconstan/miniconda3/bin/conda', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.aGWyTzLd6O/Listeners', '_CECONDA': '', 'PATH': '/Users/fconstan/miniconda3/envs/phylophlan/bin:/Users/fconstan/.jenv/shims:/Users/fconstan/.jenv/bin:/Users/fconstan/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/munki:/opt/X11/bin', '': '/Users/fconstan/miniconda3/envs/phylophlan/bin/phylophlan', 'CONDA_PREFIX': '/Users/fconstan/miniconda3/envs/phylophlan', 'PWD': '/Users/fconstan/Projects/Oral/metaphlan3/tree', 'JENV_LOADED': '1', 'XPC_FLAGS': '0x0', 'XPC_SERVICE_NAME': '0', '_CE_M': '', 'HOME': '/Users/fconstan', 'SHLVL': '1', 'LOGNAME': 'fconstan', 'CONDA_PYTHON_EXE': '/Users/fconstan/miniconda3/bin/python', 'JENV_SHELL': 'bash', 'LC_CTYPE': 'UTF-8', 'CONDA_DEFAULT_ENV': 'phylophlan', 'DISPLAY': '/private/tmp/com.apple.launchd.Gxdot2MuOz/org.macosforge.xquartz:0', '__CF_USER_TEXT_ENCODING': '0x1F7:0x0:0x2'} [e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>. [e] error while mapping {'program_name': '/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'params': 'blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0', 'input': '--query', 'database': '--db', 'output': '--out', 'version': 'version', 'command_line': '#program_name# #params# #input# #database# #output#'} test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna phylophlan_databases/phylophlan/phylophlan.dmnd test/test_some_genomes_phylophlan/tmp/map_dna GCA_002554315.1_ASM255431v1_genomic.b6o.bkp 1 False [e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>. [e] gene_markers_identification crashed

You should post this on https://github.com/biobakery/phylophlan/

Fedorov113 commented 4 years ago

Excuse me, but why don't you provide phylogenetic tree? This is essential for unifrac method, for example.

@Fedorov113 We do not provide it because we have not built one yet, the previous tree built after the MetaPhlAn2 reference genomes was computed for a different project

Yeah, I am building it myself right now and will share the results. I ran into a problem: I take info from mpa_pkl['taxonomy'] and I take GCA_xxxx from |t__ part from mpa_pkl['taxonomy'].keys()

However, there are some instances like k__Bacteria\|p__Tenericutes\|c__Mollicutes\|o__Mycoplasmatales\|f__Mycoplasmataceae\|g__Mycoplasma\|s__Mycoplasma_wenyonii\|t__GCA_002705755 and k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Microbacteriaceae|g__Microbacterium|s__Microbacterium_esteraromaticum|t__GCA_002705755 which share the same GCA_002705755 which is obviously an error.

Or should I take info about genomes from mpa_pkl['markers']?

Thank you!

Fedorov113 commented 4 years ago

I believe we should open this issue until we get a proper solution:

  1. Prebuilt tree for current chocophlan database.
  2. All the scripts necessary for building the tree.

I selected 10360 bacterial and archaeal genomes (One genome per clade that is s__*** in mpa_pkl) and Phylophlan 3.0 is still running for 30+ hours using 100 cores. This is clearly a task that not everyone can perform due to computational resources requirements.

fbeghini commented 4 years ago

cc @fasnicar

fconstancias commented 4 years ago

@Fedorov113, were you able to generate the tree?

Fedorov113 commented 4 years ago

@fconstancias yes, but I haven't checked it in detail yet. It's tips also needs to be renamed from GCA_** identifiers to chokophlans s__***. This data is available in notebook prepare_genomes_and_metadata

The code is a bit messy, I will return to it in a couple of weeks, but I would be happy if you'll help, here is the repo

tree_view

fasnicar commented 4 years ago

Hello everyone,

Many thanks @Fedorov113 for doing this. We are actually working on building a reference phylogeny for MetaPhlAn 3.0 using PhyloPhlAn 3.0. I think we should be able to release it in a few weeks.

Many thanks, Francesco

fconstancias commented 4 years ago

Dear @fasnicar,

Any update regarding the metaphlan3 tree?

Thanks

fbeghini commented 4 years ago

You can find the Newick tree here https://github.com/biobakery/MetaPhlAn/tree/3.0/metaphlan/utils . There's also an R script for calculating the Unifrac distances providing a merged MetaPhlAn profile file.

dhiru16 commented 2 years ago

the newick file provided for metaphlan 4 is not working

image
NeginValizadegan commented 1 year ago

Can we create a tree at other taxonomic levels? e.g., genus and family? Currently I can only do unifrac calculations for species level.