RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 10 forks source link

Suspiciously perfect PCA from cod RNAseq data... #1856

Closed shedurkin closed 5 months ago

shedurkin commented 5 months ago

As part of my preliminary look at the cod RNAseq data I made a PCA (as recommended in #1774) of the liver sample data (untrimmed) and the plot I got perfectly separates samples by temperature treatment, even maintaining the temperature gradient! Honestly it's freaking me out, surely such a perfect PCA must be the result of an error somewhere -- does anyone have any thoughts?

repo for all project code code to generate PCA

image

laurahspencer commented 5 months ago

Your code looks good, and I'm not surprised that liver expression is strongly influenced by temperature. Very, very cool! The only suggestion is to re-run your PCA using more genes; that might reduce the clustering slightly. By default, the DESeq2 function plotPCA uses only the top 500 most variable genes. This code should run PCA using all genes in your dataset (but you can play around with the number used for ntop=).

pca_L <- plotPCA(vst(dds_L), intgroup = c("temp_treatment"), returnData=TRUE, ntop=nrow(assay(vst(dds_L))))

shedurkin commented 5 months ago

Oh that's super helpful information about the plotPCA function, thank you! I reran with some different numbers of genes and got some much less perfect (but still interesting) plots!

image image image