workshop: Microbial genomics, Sep 24-25, 2015

ctb commented 8 years ago

Coordinate with @tracykteal and @biobenkj.

ctb commented 8 years ago

@biobenkj says:

In terms of what I would like to teach, I was thinking of doing everything on Amazon EC2 (unless Davis has free HPC
access?) and following this sort of format:

-Basic command line operations (traversing file hierarchies, changing file names, removing things, etc.)

-Getting onto an EC2 machine

-Quality control of short reads

-Reference-based bacterial RNA-seq analysis (shameless plugging of my workflow ;) and others) * an aside on paying attention for potential batch effects in data

this could be a time to go through some sort of R tutorial as well, though that may be too much...
- reference free assembly and annotation?

-Variant calling

-Command line BLAST?

-Some bioinformatics one-liners from Stephen Turner: http://www.gettinggeneticsdone.com/2013/10/useful-linux-oneliners-for-bioinformatics.html

tracykteal commented 8 years ago

Data Carpentry has a set of genomics lessons, still a bit in development, that would be relevant https://github.com/datacarpentry/?utf8=✓&query=genomics

Maybe the shell-genomics and cloud-computing-genomics in particular

It seems like the focus here is on working with a microbial genomes rather than community data? Microbial genomics is still a little broad though. It might be worth deciding what we want people to know by the end of the workshop.

Maybe the theme could be 'so you just sequenced a microbial genome'? Then you could do:

getting your data back as FASTQ and how to set up the project
log in to Amazon for the analysis
some command line on Amazon
bioinformatics analyses on Amazon
- quality filtering - FASTQC, Trimmomatic
- reference free assembly and annotation
- assembly and annotation with a reference (for those who have re-sequenced rather than sequenced) (discussion of strengths and weaknesses of each strategy)
- create a BLAST database with your genome (or the reads?)
- query that database with some things you care about
- a few other things people want to do with microbial genomes - SNP calling? looking for motifs? finding a gene of interest and looking at its flanking regions? comparative genomics beyond SNP calling?
- introducing a viewer might also be useful

ctb commented 8 years ago

I think we could do either:

day 1: quality control of reads & variant calling & viz.
day 2: RNAseq analysis & differential expression followthrough

or:

day 1: quality control of reads & genome assembly & annotation w/prokka. day 2: RNAseq analysis & differential expression followthrough

Preferences? I feel like the latter is more useful in this day and age but would appreciate comments and alt thoughts :)

I can put together QC, assembly, variant calling, and annotation, but would appreciate help with RNAseq and diff expr follow through. Do you have a data set or three already handy, @biobenkj?

biobenkj commented 8 years ago

Absolutely I have a few data sets we can use. I prefer the latter and can aid with the RNAseq and DGE follow through.

All work being done on Amazon EC2?
Is it assumed some familiarity with command line operations? If not, spend some time on basics?
Interleave comments about reproducibility and robustness of results
Mostly focus on Illumina technologies or include PacBio/Nanopore/etc.?

ctb commented 8 years ago

Great! I'll put together a page.

Yes, all on EC2. I'll bankroll it (for you) if we can't get credits.
No assumptions :)
Those come naturally to me...
I think we can do a co-assembly with nanopore or long reads if we have 'em.

ctb commented 8 years ago

@biobenkj @tracykteal could you do a quick once-over? http://dib-training.readthedocs.org/en/pub/2015-09-24-microbes.html - note, can edit at https://github.com/dib-lab/dib-training/blob/pub/2015-09-24-microbes.rst

biobenkj commented 8 years ago

:) The page looks great. Made a couple typo corrections. I will make the rst files for the online tutorial and look for comments.

In terms of RNAseq analysis and DGE I was thinking something like:

Day 2:

Setting up your project and getting your data
QC and trimming
Picking a workflow(s)
Considerations for reference and non-reference based RNAseq analysis
- Using the assembly from previous day?
- Generate gtf annotation file
- Improving the reference with RNAseq
Considerations for confounding variables (e.g. batch effects) and how to look for them (more tools!)
Useful visualization for results (scatter plots, degust! - http://vicbioinformatics.com/degust/, etc.)

Thoughts?

ctb commented 8 years ago

On Tue, Sep 01, 2015 at 07:17:09AM -0700, Ben Johnson wrote:

:) The page looks great. Made a couple typo corrections. I will make the rst files for the online tutorial and look for comments.

OK - are you familiar with reST/sphinx? We can use Markdown too if you prefer.

Here's how I've been doing things:

2015-may-nonmodel.readthedocs.org https://github.com/ngs-docs/2015-may-nonmodel

and I can give you repo access to ngs-docs, or whatever.

Note tutorial on ReadTheDocs: https://github.com/ngs-docs/angus/blob/2015/week3/CTB_github_editing.rst

...but I'm happy to set all of that up, just produce the docs :) :).

In terms of RNAseq analysis and DGE I was thinking something like:

Day 2:

Setting up your project and getting your data

QC and trimming

Picking a workflow(s)

Considerations for reference and non-reference based RNAseq analysis

Using the assembly from previous day?

Generate gtf annotation file

Improving the reference with RNAseq

Considerations for confounding variables (e.g. batch effects) and how to look for them (more tools!)

Useful visualization for results (scatter plots, degust! - http://vicbioinformatics.com/degust/, etc.)

+1 sounds good.

--titus

tracykteal commented 8 years ago

Thanks, page looks good, and I like the focus on genome assembly and annotation with transcriptomics after that.

It seems like things are pretty well mapped out. Are there any components that would be helpful to have me teach, or @biobenkj do you have things mapped out already? What's the plan for the differential expression analysis component? DESeq for some R, or too much?

biobenkj commented 8 years ago

@tracykteal I don't have things explicitly mapped out yet, but really liked the comparison between common DGE methods (DESeq2, edgeR, and limma/voom) from Meeta (https://github.com/ngs-docs/msu_ngs2015/blob/master/hands-on.Rmd) used in NGS '15 alumni week. In my own work/experience I primarily use edgeR over DESeq2 as I find that getting things into the ExpressionSet object can be a pain, correcting for confounding variables can be more straight forward, and prefer the underlying assumptions utilized for DGE.

For the DGE analysis section I was thinking of explaining some of the fundamental considerations for differential expression:

Replicate number
Negative binomial distribution
Confounding variables (e.g. batch effects)
Whether you can even do DGE (e.g. sample grouping with MDS plot)
...other things?

Anything in particular you would like to teach? What workflow do you use when analyzing bacterial RNAseq data? Are there visualization tools that you really like? Incorporating some of the Data Carpentry R and even command line lessons might be useful, though the workshop is 9 a.m. to 3 p.m. with a lunch break.

Thoughts?

ctb commented 8 years ago

On Tue, Sep 01, 2015 at 09:33:41AM -0700, Ben Johnson wrote:

@tracykteal I don't have things explicitly mapped out yet, but really liked the comparison between common DGE methods (DESeq2, edgeR, and limma/voom) from Meeta (https://github.com/ngs-docs/msu_ngs2015/blob/master/hands-on.Rmd) used in NGS '15 alumni week. In my own work/experience I primarily use edgeR over DESeq2 as I find that getting things into the ExpressionSet object can be a pain, correcting for confounding variables can be more straight forward, and prefer the underlying assumptions utilized for DGE.

For the DGE analysis section I was thinking of explaining some of the fundamental considerations for differential expression:

Replicate number

Negative binomial distribution

Confounding variables (e.g. batch effects)

Whether you can even do DGE (e.g. sample grouping with MDS plot)

...other things?

Anything in particular you would like to teach? What workflow do you use when analyzing bacterial RNAseq data? Are there visualization tools that you really like? Incorporating some of the Data Carpentry R and even command line lessons might be useful, though the workshop is 9 a.m. to 3 p.m. with a lunch break.

Thoughts?

+1 for your schedule, Ben. I don't think there's time to do more than mention R as something people might like to learn; we will have plenty of workshops for people who want to learn more.

biobenkj commented 8 years ago

@ctb I am reasonably familiar with reST/Sphinx and will have a look at how you've been doing things for the May 2015 non-model-organism workshop.

I will follow(ish) the tutorial to generate the docs.

biobenkj commented 8 years ago

Any issue with doing all the RNA-seq analysis in a Jupyter notebook on EC2 for the workshop?

ctb commented 8 years ago

Our previous experience has been that mixing shell and Python confuses everyone.

On Sep 19, 2015, at 1:38 PM, Ben Johnson notifications@github.com wrote:

Any issue with doing all the RNA-seq analysis in a Jupyter notebook on EC2 for the workshop?

— Reply to this email directly or view it on GitHub.

ctb commented 8 years ago

(so, short answer, I don't think it'll work well. We've had better luck running shell commands in the shell, and graphing/plotting in ipynb.)

On Sat, Sep 19, 2015 at 01:38:03PM -0700, Ben Johnson wrote:

Any issue with doing all the RNA-seq analysis in a Jupyter notebook on EC2 for the workshop?

Reply to this email directly or view it on GitHub:

https://github.com/dib-lab/dib-training/issues/5#issuecomment-141705691

C. Titus Brown, ctbrown@ucdavis.edu

biobenkj commented 8 years ago

Alright. Sounds good.

dib-lab / dib-training

workshop: Microbial genomics, Sep 24-25, 2015 #5

https://github.com/dib-lab/dib-training/issues/5#issuecomment-141705691