Closed fconstancias closed 3 years ago
Hi @fconstancias , we have not provided a phylogenetic tree with all the species included in MetaPhlAn 3.
For this purpose, you could use the new PhyloPhlAn 3 for building the tree using --diversity high
.
The genome accessions can be extracted from the mpa_v30_CHOCOPhlAn_201901.pkl file. If you have issues in retrieving the genomes from NCBI, let me know, I can build a tarball and share it with you.
Hi @fbeghini, thanks for your input. If you have time to build a tarball that would be great.
Excuse me, but why don't you provide phylogenetic tree? This is essential for unifrac method, for example.
I actually got the following error trying to generate a phylogenetic tree from metaphlan v3 reference genomes. Any idea what I am doing wrong?
phylophlan --version
PhyloPhlAn version 3.0.51 (11 May 2020)
(phylophlan) bt141-143:tree fconstan$ phylophlan -i test_some_genomes --diversity high --fast --nproc 2 -d phylophlan -f supertree_aa.cfg --output_folder test
Loading files from "/Users/fconstan/Projects/Oral/metaphlan3/tree/test_some_genomes" Mapping "phylophlan" on 21 inputs (key: "map_dna") Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554635.2_ASM255463v2_genomic.fna" Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna"
[e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>.
[e] cannot execute command command_line: /Users/fconstan/miniconda3/envs/phylophlan/bin/diamond blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna --db phylophlan_databases/phylophlan/phylophlan.dmnd --out test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp stdin: None stdout: None env: {'TERM_PROGRAM': 'Apple_Terminal', 'TERM': 'xterm-256color', 'SHELL': '/bin/bash', 'TMPDIR': '/var/folders/cy/3lgpr0mx1vlfpldzfs6xfckc0000gq/T/', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.nmUtCduo7V/Render', 'CONDA_SHLVL': '1', 'TERM_PROGRAM_VERSION': '421.2', 'CONDA_PROMPT_MODIFIER': '(phylophlan) ', 'TERM_SESSION_ID': '40922023-4CFA-4519-8AF9-FB929E93D7C1', 'USER': 'fconstan', 'CONDA_EXE': '/Users/fconstan/miniconda3/bin/conda', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.aGWyTzLd6O/Listeners', '_CECONDA': '', 'PATH': '/Users/fconstan/miniconda3/envs/phylophlan/bin:/Users/fconstan/.jenv/shims:/Users/fconstan/.jenv/bin:/Users/fconstan/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/munki:/opt/X11/bin', '': '/Users/fconstan/miniconda3/envs/phylophlan/bin/phylophlan', 'CONDA_PREFIX': '/Users/fconstan/miniconda3/envs/phylophlan', 'PWD': '/Users/fconstan/Projects/Oral/metaphlan3/tree', 'JENV_LOADED': '1', 'XPC_FLAGS': '0x0', 'XPC_SERVICE_NAME': '0', '_CE_M': '', 'HOME': '/Users/fconstan', 'SHLVL': '1', 'LOGNAME': 'fconstan', 'CONDA_PYTHON_EXE': '/Users/fconstan/miniconda3/bin/python', 'JENV_SHELL': 'bash', 'LC_CTYPE': 'UTF-8', 'CONDA_DEFAULT_ENV': 'phylophlan', 'DISPLAY': '/private/tmp/com.apple.launchd.Gxdot2MuOz/org.macosforge.xquartz:0', '__CF_USER_TEXT_ENCODING': '0x1F7:0x0:0x2'}
[e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>.
[e] error while mapping {'program_name': '/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'params': 'blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0', 'input': '--query', 'database': '--db', 'output': '--out', 'version': 'version', 'command_line': '#program_name# #params# #input# #database# #output#'} test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna phylophlan_databases/phylophlan/phylophlan.dmnd test/test_some_genomes_phylophlan/tmp/map_dna GCA_002554315.1_ASM255431v1_genomic.b6o.bkp 1 False
[e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>.
[e] gene_markers_identification crashed
Excuse me, but why don't you provide phylogenetic tree? This is essential for unifrac method, for example.
@Fedorov113 We do not provide it because we have not built one yet, the previous tree built after the MetaPhlAn2 reference genomes was computed for a different project
I actually got the following error trying to generate a phylogenetic tree from metaphlan v3 reference genomes. Any idea what I am doing wrong?
phylophlan --version PhyloPhlAn version 3.0.51 (11 May 2020)
(phylophlan) bt141-143:tree fconstan$ phylophlan -i test_some_genomes --diversity high --fast --nproc 2 -d phylophlan -f supertree_aa.cfg --output_folder test
Loading files from "/Users/fconstan/Projects/Oral/metaphlan3/tree/test_some_genomes" Mapping "phylophlan" on 21 inputs (key: "map_dna") Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554635.2_ASM255463v2_genomic.fna" Mapping "test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna" [e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>. [e] cannot execute command command_line: /Users/fconstan/miniconda3/envs/phylophlan/bin/diamond blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0 --query test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna --db phylophlan_databases/phylophlan/phylophlan.dmnd --out test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp stdin: None stdout: None env: {'TERM_PROGRAM': 'Apple_Terminal', 'TERM': 'xterm-256color', 'SHELL': '/bin/bash', 'TMPDIR': '/var/folders/cy/3lgpr0mx1vlfpldzfs6xfckc0000gq/T/', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.nmUtCduo7V/Render', 'CONDA_SHLVL': '1', 'TERM_PROGRAM_VERSION': '421.2', 'CONDA_PROMPT_MODIFIER': '(phylophlan) ', 'TERM_SESSION_ID': '40922023-4CFA-4519-8AF9-FB929E93D7C1', 'USER': 'fconstan', 'CONDA_EXE': '/Users/fconstan/miniconda3/bin/conda', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.aGWyTzLd6O/Listeners', '_CECONDA': '', 'PATH': '/Users/fconstan/miniconda3/envs/phylophlan/bin:/Users/fconstan/.jenv/shims:/Users/fconstan/.jenv/bin:/Users/fconstan/miniconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/munki:/opt/X11/bin', '': '/Users/fconstan/miniconda3/envs/phylophlan/bin/phylophlan', 'CONDA_PREFIX': '/Users/fconstan/miniconda3/envs/phylophlan', 'PWD': '/Users/fconstan/Projects/Oral/metaphlan3/tree', 'JENV_LOADED': '1', 'XPC_FLAGS': '0x0', 'XPC_SERVICE_NAME': '0', '_CE_M': '', 'HOME': '/Users/fconstan', 'SHLVL': '1', 'LOGNAME': 'fconstan', 'CONDA_PYTHON_EXE': '/Users/fconstan/miniconda3/bin/python', 'JENV_SHELL': 'bash', 'LC_CTYPE': 'UTF-8', 'CONDA_DEFAULT_ENV': 'phylophlan', 'DISPLAY': '/private/tmp/com.apple.launchd.Gxdot2MuOz/org.macosforge.xquartz:0', '__CF_USER_TEXT_ENCODING': '0x1F7:0x0:0x2'} [e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>. [e] error while mapping {'program_name': '/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'params': 'blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0', 'input': '--query', 'database': '--db', 'output': '--out', 'version': 'version', 'command_line': '#program_name# #params# #input# #database# #output#'} test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna phylophlan_databases/phylophlan/phylophlan.dmnd test/test_some_genomes_phylophlan/tmp/map_dna GCA_002554315.1_ASM255431v1_genomic.b6o.bkp 1 False [e] Command '['/Users/fconstan/miniconda3/envs/phylophlan/bin/diamond', 'blastx', '--quiet', '--threads', '1', '--outfmt', '6', '--more-sensitive', '--id', '50', '--max-hsps', '35', '-k', '0', '--query', 'test/test_some_genomes_phylophlan/tmp/uncompressed/GCA_002554315.1_ASM255431v1_genomic.fna', '--db', 'phylophlan_databases/phylophlan/phylophlan.dmnd', '--out', 'test/test_some_genomes_phylophlan/tmp/map_dna/GCA_002554315.1_ASM255431v1_genomic.b6o.bkp']' died with <Signals.SIGILL: 4>. [e] gene_markers_identification crashed
You should post this on https://github.com/biobakery/phylophlan/
Excuse me, but why don't you provide phylogenetic tree? This is essential for unifrac method, for example.
@Fedorov113 We do not provide it because we have not built one yet, the previous tree built after the MetaPhlAn2 reference genomes was computed for a different project
Yeah, I am building it myself right now and will share the results.
I ran into a problem: I take info from mpa_pkl['taxonomy']
and I take GCA_xxxx
from |t__
part from mpa_pkl['taxonomy'].keys()
However, there are some instances like k__Bacteria\|p__Tenericutes\|c__Mollicutes\|o__Mycoplasmatales\|f__Mycoplasmataceae\|g__Mycoplasma\|s__Mycoplasma_wenyonii\|t__GCA_002705755
and k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Microbacteriaceae|g__Microbacterium|s__Microbacterium_esteraromaticum|t__GCA_002705755
which share the same GCA_002705755
which is obviously an error.
Or should I take info about genomes from mpa_pkl['markers']
?
Thank you!
I believe we should open this issue until we get a proper solution:
I selected 10360 bacterial and archaeal genomes (One genome per clade
that is s__***
in mpa_pkl
) and Phylophlan 3.0 is still running for 30+ hours using 100 cores. This is clearly a task that not everyone can perform due to computational resources requirements.
cc @fasnicar
@Fedorov113, were you able to generate the tree?
@fconstancias yes, but I haven't checked it in detail yet. It's tips also needs to be renamed from GCA_**
identifiers to chokophlans s__***
. This data is available in notebook prepare_genomes_and_metadata
The code is a bit messy, I will return to it in a couple of weeks, but I would be happy if you'll help, here is the repo
Hello everyone,
Many thanks @Fedorov113 for doing this. We are actually working on building a reference phylogeny for MetaPhlAn 3.0 using PhyloPhlAn 3.0. I think we should be able to release it in a few weeks.
Many thanks, Francesco
Dear @fasnicar,
Any update regarding the metaphlan3 tree?
Thanks
You can find the Newick tree here https://github.com/biobakery/MetaPhlAn/tree/3.0/metaphlan/utils . There's also an R script for calculating the Unifrac distances providing a merged MetaPhlAn profile file.
the newick file provided for metaphlan 4 is not working
Can we create a tree at other taxonomic levels? e.g., genus and family? Currently I can only do unifrac calculations for species level.
Thanks a lot for releasing metaphlan3. Is there any availble metaphlan3 phylogenetic tree? any easy way to compute one?
Thanks a lot.