dib-lab / dib-training

Web site for Data Intensive Training at UC Davis
https://dib-training.readthedocs.io/en/pub/
8 stars 15 forks source link

workshop: Microbial genomics, Sep 24-25, 2015 #5

Closed ctb closed 8 years ago

ctb commented 8 years ago

Coordinate with @tracykteal and @biobenkj.

ctb commented 8 years ago

@biobenkj says:

In terms of what I would like to teach, I was thinking of doing everything on Amazon EC2 (unless Davis has free HPC
access?) and following this sort of format:

-Basic command line operations (traversing file hierarchies, changing file names, removing things, etc.)

-Getting onto an EC2 machine

-Quality control of short reads

-Reference-based bacterial RNA-seq analysis (shameless plugging of my workflow ;) and others) * an aside on paying attention for potential batch effects in data

-Variant calling

-Command line BLAST?

-Some bioinformatics one-liners from Stephen Turner: http://www.gettinggeneticsdone.com/2013/10/useful-linux-oneliners-for-bioinformatics.html

tracykteal commented 8 years ago

Data Carpentry has a set of genomics lessons, still a bit in development, that would be relevant https://github.com/datacarpentry/?utf8=✓&query=genomics

Maybe the shell-genomics and cloud-computing-genomics in particular

It seems like the focus here is on working with a microbial genomes rather than community data? Microbial genomics is still a little broad though. It might be worth deciding what we want people to know by the end of the workshop.

Maybe the theme could be 'so you just sequenced a microbial genome'? Then you could do:

ctb commented 8 years ago

I think we could do either:

or:

day 1: quality control of reads & genome assembly & annotation w/prokka. day 2: RNAseq analysis & differential expression followthrough

Preferences? I feel like the latter is more useful in this day and age but would appreciate comments and alt thoughts :)

I can put together QC, assembly, variant calling, and annotation, but would appreciate help with RNAseq and diff expr follow through. Do you have a data set or three already handy, @biobenkj?

biobenkj commented 8 years ago

Absolutely I have a few data sets we can use. I prefer the latter and can aid with the RNAseq and DGE follow through.

ctb commented 8 years ago

Great! I'll put together a page.

ctb commented 8 years ago

@biobenkj @tracykteal could you do a quick once-over? http://dib-training.readthedocs.org/en/pub/2015-09-24-microbes.html - note, can edit at https://github.com/dib-lab/dib-training/blob/pub/2015-09-24-microbes.rst

biobenkj commented 8 years ago

:) The page looks great. Made a couple typo corrections. I will make the rst files for the online tutorial and look for comments.

In terms of RNAseq analysis and DGE I was thinking something like:

Day 2:

Thoughts?

ctb commented 8 years ago

On Tue, Sep 01, 2015 at 07:17:09AM -0700, Ben Johnson wrote:

:) The page looks great. Made a couple typo corrections. I will make the rst files for the online tutorial and look for comments.

OK - are you familiar with reST/sphinx? We can use Markdown too if you prefer.

Here's how I've been doing things:

2015-may-nonmodel.readthedocs.org https://github.com/ngs-docs/2015-may-nonmodel

and I can give you repo access to ngs-docs, or whatever.

Note tutorial on ReadTheDocs: https://github.com/ngs-docs/angus/blob/2015/week3/CTB_github_editing.rst

...but I'm happy to set all of that up, just produce the docs :) :).

In terms of RNAseq analysis and DGE I was thinking something like:

Day 2:

  • Setting up your project and getting your data
  • QC and trimming
  • Picking a workflow(s)
  • Considerations for reference and non-reference based RNAseq analysis
    • Using the assembly from previous day?
    • Generate gtf annotation file
    • Improving the reference with RNAseq
  • Considerations for confounding variables (e.g. batch effects) and how to look for them (more tools!)
  • Useful visualization for results (scatter plots, degust! - http://vicbioinformatics.com/degust/, etc.)

+1 sounds good.

--titus

tracykteal commented 8 years ago

Thanks, page looks good, and I like the focus on genome assembly and annotation with transcriptomics after that.

It seems like things are pretty well mapped out. Are there any components that would be helpful to have me teach, or @biobenkj do you have things mapped out already? What's the plan for the differential expression analysis component? DESeq for some R, or too much?

biobenkj commented 8 years ago

@tracykteal I don't have things explicitly mapped out yet, but really liked the comparison between common DGE methods (DESeq2, edgeR, and limma/voom) from Meeta (https://github.com/ngs-docs/msu_ngs2015/blob/master/hands-on.Rmd) used in NGS '15 alumni week. In my own work/experience I primarily use edgeR over DESeq2 as I find that getting things into the ExpressionSet object can be a pain, correcting for confounding variables can be more straight forward, and prefer the underlying assumptions utilized for DGE.

For the DGE analysis section I was thinking of explaining some of the fundamental considerations for differential expression:

Anything in particular you would like to teach? What workflow do you use when analyzing bacterial RNAseq data? Are there visualization tools that you really like? Incorporating some of the Data Carpentry R and even command line lessons might be useful, though the workshop is 9 a.m. to 3 p.m. with a lunch break.

Thoughts?

ctb commented 8 years ago

On Tue, Sep 01, 2015 at 09:33:41AM -0700, Ben Johnson wrote:

@tracykteal I don't have things explicitly mapped out yet, but really liked the comparison between common DGE methods (DESeq2, edgeR, and limma/voom) from Meeta (https://github.com/ngs-docs/msu_ngs2015/blob/master/hands-on.Rmd) used in NGS '15 alumni week. In my own work/experience I primarily use edgeR over DESeq2 as I find that getting things into the ExpressionSet object can be a pain, correcting for confounding variables can be more straight forward, and prefer the underlying assumptions utilized for DGE.

For the DGE analysis section I was thinking of explaining some of the fundamental considerations for differential expression:

  • Replicate number
  • Negative binomial distribution
  • Confounding variables (e.g. batch effects)
  • Whether you can even do DGE (e.g. sample grouping with MDS plot)
  • ...other things?

Anything in particular you would like to teach? What workflow do you use when analyzing bacterial RNAseq data? Are there visualization tools that you really like? Incorporating some of the Data Carpentry R and even command line lessons might be useful, though the workshop is 9 a.m. to 3 p.m. with a lunch break.

Thoughts?

+1 for your schedule, Ben. I don't think there's time to do more than mention R as something people might like to learn; we will have plenty of workshops for people who want to learn more.

biobenkj commented 8 years ago

@ctb I am reasonably familiar with reST/Sphinx and will have a look at how you've been doing things for the May 2015 non-model-organism workshop.

I will follow(ish) the tutorial to generate the docs.

biobenkj commented 8 years ago

Any issue with doing all the RNA-seq analysis in a Jupyter notebook on EC2 for the workshop?

ctb commented 8 years ago

Our previous experience has been that mixing shell and Python confuses everyone.

On Sep 19, 2015, at 1:38 PM, Ben Johnson notifications@github.com wrote:

Any issue with doing all the RNA-seq analysis in a Jupyter notebook on EC2 for the workshop?

— Reply to this email directly or view it on GitHub.

ctb commented 8 years ago

(so, short answer, I don't think it'll work well. We've had better luck running shell commands in the shell, and graphing/plotting in ipynb.)

On Sat, Sep 19, 2015 at 01:38:03PM -0700, Ben Johnson wrote:

Any issue with doing all the RNA-seq analysis in a Jupyter notebook on EC2 for the workshop?


Reply to this email directly or view it on GitHub:

https://github.com/dib-lab/dib-training/issues/5#issuecomment-141705691

C. Titus Brown, ctbrown@ucdavis.edu

biobenkj commented 8 years ago

Alright. Sounds good.