Prototype HIV NGS pipeline

spond commented 8 years ago

Following our discussion with @nekrut , I am creating an issue to outline essential components for creating a barebones HIV (or other short RNA virus) genomic analysis pipeline.

Key steps for amplicon data (a very preliminary outline, many steps need to be fleshed out, initial focus is on wrapping some existing tools that we have worked with in the past)

Initial QC on NGS data
- [x] Standard FASTQ filtering tools (q-score, read length, N's), e.g. qfilt
- [x] paired end merging
(If present in run) random tag (PRIMER ID) processing
- [ ] Binning and consensus construction
- [ ] Filtering using an error model, like the one from the Swanstrom Lab
Map to reference (amplicon data), FASTQ => BAM
- [ ] Custom mappers (direct alignment, e.g. Smith Waterman)
- [x] [bealign] (https://github.com/veg/BioExt/)
- [ ] Standard short read mappers
- [ ] Coverage maps
Filter errors (amplicon data). Need more options here
- [ ] Model-based, multinomial, filtering Currently aligned FASTA => filtered aligned FASTA. Should be BAM => filtered BAM
- [ ] See if multiple alignments can be replaced with VCFs or something along these lines
Basic post processing based on TN93
- [ ] Generate consensus sequences
- [ ] Sliding window haplotypes (simple read merging, large overlap)
- [ ] Report amino-acid frequencies by position (and DRAMs) -- need to write
Basic phylogenetics (need to pull out some code from https://github.com/veg/HIV-NGS)
HIV clustering
- [ ] hivnetworkcsv

A few references to existing (published) pipelines

WRAP DATAMONKEY INTO GALAXY

stevenweaver commented 8 years ago

2016-11-16 update : qfilt, bealign, and tn93 have been added to the test toolshed hosted by the core galaxy team at PSU. The tools currently reside in the veg fork of the tools-iuc repo, and they are installed on local instances of galaxy in the VEG group. All planemo tests pass, and bioext's python packaging bug (an issue where users installing through pip would come across "header not found" errors) has been resolved.

bgruening commented 8 years ago

@stevenweaver top!!!

stevenweaver commented 8 years ago

@bgruening this progress wouldn't have been possible without @davebx

bgruening commented 8 years ago

That what makes Galaxy so great - community! @davebx thanks a bunch! And let me know if you need help with the pending conda package.

nekrut commented 6 years ago

2/20 Meeting Notes @nekrut @davebx

[x] Check is qfilt, bealign, and TN93 are IUC ready
[x] See if duplex pipeline can be used for PRIMER ID
[x] Show how to combine samples in Galaxy

NEXT MEETING FRI @ 3 (March 2)

spond commented 6 years ago

@nekrut : here's the paper on PRIMER ID processing.http://jvi.asm.org/content/89/16/8540.full.pdf+html

davebx commented 6 years ago

Per my estimation, bealign is IUC ready, TN93 and qfilt need a few best-practices tweaks.

nekrut commented 6 years ago

Examples of HIV processing pipeline

An example paper is Gianella et al. 2016

Take individual patient level reads; there are multiple time points per individual
Error correct them / assemble (these are short amplicons, so assembly is not even that important)
Build phylogenies and interpret the results

Another example, which now uses full length genome data is by Zanini et al. 2015.

The basic workflow

Assemble individual HIV-genomes
Call variants relative to the genome
Perform downstream analysis

nekrut commented 6 years ago

Meeting 5/30

Wrapping of phylotree continues
Move galaxy.hyphy.org tools to test galaxy
Demo one wrapped HyPhy tool next meeting (June 6)
- https://github.com/veg/tools-iuc/issues/42

stevenweaver commented 6 years ago

Meeting 6/6

Wrapping of phylotree continues - We have a working implementation, pushing to branch for review by galaxy folks.
Will need to use the phylotree widget, or a visualization built on top of it, for branch selection for some HyPhy methods. After we've included the visualization, Sam will see how to upload newick string exported from the viz to a new history item.
Work on FUBAR will begin.

davebx commented 6 years ago

Pull requests created for tn93, qfilt, and hivnetworkcsv

davebx commented 6 years ago

Meeting june 20:

FUBAR wrapped
qfilt, hivclustering, BioExt merged
torque configured on galaxy.hyphy
Phylotree viz close to complete, only a bit of glue needed to plug into galaxy

davebx commented 6 years ago

Meeting july 18:

Discussed current and upcoming development
Located a potential supplier of the glue mentioned in the previous comment

galaxyproject / tools-iuc

Prototype HIV NGS pipeline #1020