Closed sr320 closed 5 years ago
@kubu4 I believe you also did some gene prediction - could you write up a short results section on this analysis? - will need it for metaproteomic paper
Please add methods and results @ https://docs.google.com/document/d/1amaNX86VUDcXi0UGzgmHYt8QVe6fSuVcT1oYohlFCDM/edit?ts=5b918dab
OK, I've done a "quick" analysis of this and have a pretty nice figure that displays the taxonomic diversity of the metagenomics data (using Krona plot). However, I've only done this using BLASTp data (figured it would be faster). Is it more appropriate to classify things at the nucleotide level?
Go ahead and get what you have done in the paper and start a nucleotide level search On Mar 25, 2019, 7:25 AM -0700, kubu4 notifications@github.com, wrote:
OK, I've done a "quick" analysis of this and have a pretty nice figure that displays the taxonomic diversity of the metagenomics data (using Krona plot). However, I've only done this using BLASTp data (figured it would be faster). Is it more appropriate to classify things at the nucleotide level? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Alrighty, I've added the info about the large metagenome assembly that I've done so far (includes nucleotide and protein-level taxonomic Krona plots).
Info has been added to Materials & Methods and the Results sections.
Just stumbled across some new software and visualizations for metagenomics:
http://merenlab.org/2016/06/22/anvio-tutorial-v2/
Will explore a bit more and try to use it. Looks insanely good/thorough, with great tutorials!
Update. Running Anvi'o, but it'll take awhile. Saw this when looking at SLURM output today:
In response to the "memory skull" in the blue area at the bottom and the 478GB of RAM (not to mention, the progress on the contigs was ~3 -5 contigs/second) notation, I opted to put the Maker job on hold and launch this on the 500GB srlab node to see if the increased memory will help this progress faster. If not, I'll continue the Maker run and switch Anvi'o back to coenv. However, the progress I was seeing suggests that the Anvi'o analysis would take many weeks (or, longer). :open_mouth:
Let’s find a quicker option - see review paper I posted On Apr 4, 2019, 3:32 PM -0700, kubu4 notifications@github.com, wrote:
Update. Running Anvi'o, but it'll take awhile. Saw this when looking at SLURM output today: In response to the "memory skull" in the blue area at the bottom and the 478GB of RAM (not to mention, the progress on the contigs was ~3 -5 contigs/second) notation, I opted to put the Maker job on hold and launch this on the 500GB srlab node to see if the increased memory will help this progress faster. If not, I'll continue the Maker run and switch Anvi'o back to coenv. However, the progress I was seeing suggests that the Anvi'o analysis would take many weeks (or, longer). 😮 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Will do. I know Anvi'o incorporates some of those programs into it's pipeline (e.g. CONCOCT for sample binning).
For reference below
LikelyBin says "reasonable time" :)
Features
Sequencing technologies
Illumina High throughput; low errors; short reads
Ion Torrent High throughput; low errors; short reads
Pacific Biosciences Medium throughput; high raw error rate; long reads
Oxford Nanopore Medium throughput; high raw error rate; long reads
Metagenomic assembly
MetaVelvet Linux/Unix command-line tool; requires large amounts of RAM; may take several days to run
MetaVelvet-SL Extension to MetaVelvet with similar charateristics; improved detection of chimeras
IDBA-UD Linux/Unix command-line tool; requires large amounts of RAM; may take several days to run
Ray Meta Linux/Unix command-line tool; designed for high-performance computing (HPC) and uses multiple-cores; uses MPI; capable of dealing with very large datasets
Megahit Linux/Unix command-line tool; lower memory and processor requirements, though only for certain options
Pell et al Linux/Unix command-line code implemented as part of the khmer Python codebase (https://github.com/dib-lab/khmer)
MetAMOS Linux/Unix command-line tool; depends on many other software tools; may require large amounts of RAM depending on the assembler used
Binning
LikelyBin Linux/Unix command-line; designed to run on simple commodity/desktop PCs in a reasonable time
PHYSCIMM Linux/Unix command-line; requires 50Gb RAM and 24 hours to build models
MetaWatt GUI-based; designed to run on desktop hardware
CONCOCT Linux/Unix command-line; depends on other software; initially used Ray Meta for assembly
LSA Linux/Unix command-line; uses 10s of Gb of RAM
Gene Prediction
MetaGeneAnnotator Available as Linux/Unix command-line or through web interface (web interface limited to 10Mb)
Orphelia Available as Linux/Unix command-line or through web interface (web interface limited to 30Mb)
Glimmer-MG Available as Linux/Unix command-line; depends on other software; model building requires download of all current bacterial genomes
FragGenScan Available as Linux/Unix command-line; designed to run on commodity/desktop hardware in minutes/hours
Prokka Available as Linux/Unix command-line; depends on other software; uses parallel processing
Domain DBs
InterPro A consortium of 14 protein/domain/family databases
InterProScan Available as Linux/Unix command-line; or web-interface; or via API
Pathway Databases
Reactome Online resource for reactions/pathways; data available to download; accessible via web interface or via APIs
KEGG Online resource for reactions/pathways; data available to download for a fee; accessible via web interface or via APIs
MetaCyc Online resource for reactions/pathways; data available to download; accessible via web interface or via APIs
WikiPathways Online resource for reactions/pathways; data available to download; accessible via web interface or via APIs
Targetted Gene Discovery
Xander Available as Linux/Unix command-line; depends on other software; requires user to build gene-specific models
Data sharing and online portals
Meta4 Accessible via a web-interface once system has been set up! System set up requires knowledge of Linux, Apache and Perl
MG-RAST Online system with graphical user interface
EBI Metagenomics Online system with graphical user interface; requires data to be in EBI ENA
IMG/M Online system with graphical user interface
fasta: http://gannet.fish.washington.edu/Atumefaciens/20190102_metagenomics_geo_megahit/megahit_out/final.contigs.fa
then use coverage file http://gannet.fish.washington.edu/Atumefaciens/20190102_metagenomics_geo_megahit/coverage.txt or samfile http://gannet.fish.washington.edu/Atumefaciens/20190102_metagenomics_geo_megahit/aln.sam.gz to look at abundance.