AlexsLemonade / scpca-docs

User information about ScPCA processing
https://scpca.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Create a getting started with scpca data section #59

Closed allyhawkins closed 2 years ago

allyhawkins commented 2 years ago

Closes #52.

Here I'm adding the section on how to get started using an Scpca dataset. I've included sections on importing the RDS files into R, quality control, normalization, downstream analysis, and then working with Seurat.

In the first section I included a short introduction on the two RDS files that are present for each sample and linked out to other sections in the docs when necessary. We do have a FAQ that explains reading into R, but I remembered we talked about repeating some of that information and it seemed like a necessary first step.

For the quality control section, I included information about using miQC and how we have presence of the miQC_pass column in the objects. I have some explanation on what miQC is along with links to the vignette and documentation. I also inserted code chunks to show how you would filter using that column.

In addition to using miQC, we also have stored the basic metrics in the SCE's that we provide and that's what is recommended in OSCA. I included a section on choosing hard cutoffs and tried to caution people to explore diagnostic plots before simply choosing cutoffs. That's why I didn't provide any recommendations on which cutoffs to use there. Some questions I had about the quality control section were:

For the normalization and feature selection I generally tried to describe what we would recommend while also linking out to other resources so that they could learn on their own outside of our recommendations. I was back and forth on including code chunks in these sections since I don't want people to blindly follow this as a tutorial, but want people to actually read and stay informed. However, the code chunks seemed useful to have to me so they could follow along for each step? Do others have thoughts on this?

In terms of the dimensionality reduction section, I included some detail but did not go too far into the weeds of how PCA and UMAP are calculated. I thought that didn't seem too necessary, but let me know if there should be more detail here.

The last section I included was about using ScPCA data with Seurat. We already have a fairly extensive FAQ on transferring SCE's to a Seurat object so I provided a link to that FAQ followed by tutorials on working with Seurat performing the same steps that I described for working with SCE objects. Should there be more explanation here or more code chunks?

Some other overall questions:

allyhawkins commented 2 years ago

I went through and incorporated the suggestions from both @jaclyn-taroni and @jashapiro and made the following changes:

allyhawkins commented 2 years ago

I might reduce the UMAP caution section to just the crux, something like: "Use caution when interpreting UMAP results; the location, spacing, and density of the clusters can be dependent on parameter choices and random effects, and may not accurately reflect relationships among cells." I mostly want to get it down to 1-2 sentences, as a reminder of things they have hopefully heard before.

I went ahead and incorporated this suggestion and paired it down to 1 sentence along with incorporating the other minor suggestions. Let me know if this looks as expected.