AlexsLemonade / compendium-processing

A series of analyses related to refine.bio species compendia
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[QC PART 9] What samples comprise the two groups of RNA-seq samples? #23

Closed jaclyn-taroni closed 5 years ago

jaclyn-taroni commented 5 years ago

_I'm breaking up the files from jaclyn-taroni/qc_viz into smaller PRs. (That branch will be retained for posterity.)_

In 05-pca_test_compendium, we found that there are what look to be two groups of RNA-seq samples (separated in PC1). Our initial suspicion was that this had to do with the selection strategy (e.g., poly-A enrichment vs. ribo-depletion), but that didn't appear to be the case after taking a look at the methods for a handful of samples. In this notebook, I explored whether things like paired- vs. single-end, the length of the reads, the inferred library type, etc. could account for this difference. It seems that one group is comprised of large experiments (100-1000s of samples) from the Wellcome Sanger Institute Zebrafish Mutation Project, which tend to have lower mapping rates compared to a random selection of samples from the other group.