keep in mind

有些技术是适用于Microarrays的

##################数据按DESeq2 document速览##################

born in 2014，aim，同类有...limma是正态分布，

##################数据按DESeq2要求给入##################

召唤数据和样本信息，有格式要求：信息a，信息b，组合成信息c。参照官网格式。

step1，对于信息a

expression values

Note:

only the count values

step2，对于信息b

对于样本的描述

step3，对于信息c

Expression Set Object包含a和b，并且给出design formula。 Note:

表达矩阵的列名要和表型矩阵的行名顺序对应
感兴趣的放在最后，想控制的放在前面

##################数据处理部分################## #########首先面对的是Normalization

需要讨论的： TMM等，与scale和log三者关系： TMM等，scale应该都可以称为normalize log: transformed

ExpressionNormalizationWorkflow

于pc loading

biostars上的详细讨论 PVCA 分为监督和非监督的

哈佛DGE简介，解释reduced和LRT，以及not log2 fold changes而是从加强P cutoff

reduced和LRT

reduced：for test="LRT", a reduced formula to compare against

collapseReplicates()解读 the term techical replicate implies multiple sequencing runs of the same library.

标准化的方法，不太理解

cds <- newCountDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory) cds <- estimateSizeFactors(cds) cds <- estimateDispersions(cds) data.frame(sizefactors=sizeFactors(cds), rawcounts=colSums(counts(cds, normalized=FALSE)))

DESeq()函数的理解，引用自青山屋主文章

DESeq包含三步，estimation of size factors（estimateSizeFactors)， estimation of dispersion（estimateDispersons)， Negative Binomial GLM fitting and Wald statistics（nbinomWaldTest），可以分布运行，也可用一步到位，最后返回 results可用的DESeqDataSet对象。

DESeq2结果p-value和padj设为NA的理由

gmgitx / BLOG_natural_science

DESeq2研习.md #25

25

#基因表达数据（RNA-Seq）整体思路

keep in mind

召唤数据和样本信息，有格式要求：信息a，信息b，组合成信息c。参照官网格式。

于pc loading

reduced和LRT

标准化的方法，不太理解