TheJacksonLaboratory / sbas

CloudOS Digital Research Environment
4 stars 5 forks source link

Apply filtering before proceeding with analysis (from yarn paper) #37

Closed cgpu closed 4 years ago

cgpu commented 4 years ago

As highlighted in the current version of the manuscript:

image

Two types of filtering out must be performed right after the fetching of the gtex.rds via yarn::downloadgtexV8().

We excluded the X, Y, and mitochondrial genesgenes, and performed PCoA using Euclidean distance on the log2-transformed raw count expression data

adeslatt commented 4 years ago

Didn't see where this was actually done in the code @karleg your assistance here would be greatly appreciated to ensure that we have things in order.

karleg commented 4 years ago

We decided not to merge tissues in the new version

adeslatt commented 4 years ago

@karleg and @pnrobinson could you comment on this :

Exclude genes on sex and mitochondrial chromosomes as they skew the variances
We excluded the X, Y, and mitochondrial genesgenes, and performed PCoA using Euclidean distance on the log2-transformed raw count expression data

We actually don't do this with the gene expression - we do remove the Y chromosome in the differentialSplicingJunctionAnalysis.ipynb but not in the differentialGeneExpressionAnalysis.ipynb

pnrobinson commented 4 years ago

The high variance of the X is biology partially related to the XX v one X, inactivation, escape etc., and so we need to keep it. Differences from the non-PAR regions of the Y are obvious and so we should exclude the Y from all analysis. So far I did not see any difference from the mito,, but there is no reason to exclude it either. We should probably filter out the Y chromosomal results from the DGE result!