Closed cgpu closed 4 years ago
Didn't see where this was actually done in the code @karleg your assistance here would be greatly appreciated to ensure that we have things in order.
We decided not to merge tissues in the new version
@karleg and @pnrobinson could you comment on this :
Exclude genes on sex and mitochondrial chromosomes as they skew the variances
We excluded the X, Y, and mitochondrial genesgenes, and performed PCoA using Euclidean distance on the log2-transformed raw count expression data
We actually don't do this with the gene expression - we do remove the Y chromosome in the differentialSplicingJunctionAnalysis.ipynb
but not in the differentialGeneExpressionAnalysis.ipynb
The high variance of the X is biology partially related to the XX v one X, inactivation, escape etc., and so we need to keep it. Differences from the non-PAR regions of the Y are obvious and so we should exclude the Y from all analysis. So far I did not see any difference from the mito,, but there is no reason to exclude it either. We should probably filter out the Y chromosomal results from the DGE result!
As highlighted in the current version of the manuscript:
Two types of filtering out must be performed right after the fetching of the
gtex.rds
viayarn::downloadgtexV8()
.GTEX-11ILO
- can be done with yarn::checkMisAnnotation()