new module for cuffnorm

Name of the tool

cuffnorm

Tool homepage

http://cole-trapnell-lab.github.io/cufflinks/

Tool description

normalize the expression levels from a set of RNA-Seq libraries.

Tool output

cuffnorm_example.zip

The general output structure of cuffnorm is shown below.

cuffnorm
├── cds.attr_table
├── cds.count_table
├── cds.fpkm_table
├── genes.attr_table
├── genes.count_table
├── genes.fpkm_table
├── isoforms.attr_table
├── isoforms.count_table
├── isoforms.fpkm_table
├── run.info
├── samples.table
├── tss_groups.attr_table
├── tss_groups.count_table
└── tss_groups.fpkm_table

Among them, genes.fpkm_table is the estimated abundance level table of genes, and you can extract the sample name from the header line. (The "_0" after the sample name is automatically added by the analysis tool).

The samples.table file contains statistics about the sample library size during the normalization process. Other than that, there are no ready-to-use statistics for MultiQC.

Log filename pattern

run.info

Data suitable for MultiQC plot(s)

RNA-Seq data mainly uses mRNA as input material. (It is a way to capture mRNA with poly A tail.) In this case, we mainly perform data QC by checking how many genes have RPKM values above a certain level among all genes. Based on the paper below, that threshold tends to be 0.3. (https://doi.org/10.1371/journal.pcbi.1000598)

For example, we could create a table like the one below.

how_many_gene_expressed_example_table

If this is visualized as a bar plot or something, I think it could enrich the multiqc QC content.

I have a lot of ideas for QC topics utilizing RNA-Seq data, but I'm also curious about how to create my own module in this case.

Most interesting data for the General Stats table

No response

Before submitting

[x] I have included example data (zipped, not pasted) that can be used to write the module.

MultiQC / MultiQC