Find cluster marker genes

lcolladotor commented 2 years ago

After #6, continue here.

Might be influenced by the discussion at https://github.com/LieberInstitute/spatial_hpc/issues/19 although here I don't think that using the pseudo-bulk + limma or pseudoBulkDGE() are valid here.

It comes down to deciding whether to use findMarkers() (what is scoreMarkers()???) or @lahuuki's strategy for finding deconvolution marker genes. We know that @lahuuki's strategy is not ideal for highly related clusters, so well, we might use findMarkers() (although that function still has a ton of options!!).

lahuuki commented 2 years ago

For the Neuron 10x paper we used a model to control for donor, should we do something similar for this data?

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/51d15ef9f5f2c4c53f55e22e3fe467de1a724668/10x_DLPFC-n3_step03_markerDetxn_LAH.R#L69

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/51d15ef9f5f2c4c53f55e22e3fe467de1a724668/10x_DLPFC-n3_step03_markerDetxn_LAH.R#L142-L163

lcolladotor commented 2 years ago

Hmm. We did a better job this time with removing donor effects with Harmony, so maybe not

Just a quick reaction to the question. Worth thinking about it more!

On Thu, Aug 4, 2022 at 4:13 PM Louise Huuki @.***> wrote:

For the Neuron 10x paper we used a model to control for donor, should we do something similar for this data?

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/51d15ef9f5f2c4c53f55e22e3fe467de1a724668/10x_DLPFC-n3_step03_markerDetxn_LAH.R#L69

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/51d15ef9f5f2c4c53f55e22e3fe467de1a724668/10x_DLPFC-n3_step03_markerDetxn_LAH.R#L142-L163

— Reply to this email directly, view it on GitHub https://github.com/LieberInstitute/DLPFC_snRNAseq/issues/7#issuecomment-1205721697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAROUVJHPVL5FMAL5MGIDUTVXQP6RANCNFSM5JZ2HNTA . You are receiving this because you authored the thread.Message ID: @.***>

-- Leonardo Collado Torres, Ph. D. Investigator

LIEBER INSTITUTE for BRAIN DEVELOPMENT 855 N. Wolfe St., Suite 300 Baltimore, MD 21205 lcolladotor.github.io @.***

lcolladotor commented 2 years ago

We've updated this to:

Specific resolution (29 `hc` clusters)

[x] findMarkers(pval = "all", direction = "up")
[x] findMarkers(pval = "all", direction = "up") Tran et al style (1 vs All) aka enrichment

We had thought of looking at findMarkers(pval = "any", direction = "up") within each broad cell type subset.

Layer resolution

See column layer_level (EDIT: now we'll use layer_level_post_qc) at https://docs.google.com/spreadsheets/d/1exYhF31W9NPXi6uRLyjGPy1algG3GYpUlE9rxPbCNik/edit?usp=sharing which is a combination of the broad cell type and the layer_label column from https://github.com/LieberInstitute/DLPFC_snRNAseq/blob/main/processed-data/05_explore_sce/spatial_registration_cor_details_hc.csv.

Note how this involves dropping about 2k excitatory nuclei (about 10% of the excitatory nuclei) compared to the other 2 resolutions.

[x] findMarkers(pval = "all", direction = "up")
[x] findMarkers(pval = "all", direction = "up") Tran et al style (1 vs All) aka enrichment
[ ] mean ratio

Depending on the results we get from findMarkers() we might try findMarkers(pval = "any", direction = "up") within each broad cell type subset, mostly for the excitatory neurons.

Broad cell type resolution

[ ] mean ratio

@Nick-Eagles will do the mean ratio ones as those are the genes we want to use for deconvolution (potentially at https://github.com/LieberInstitute/spatialDLPFC). Louise will do the other ones.

lcolladotor commented 1 year ago

I believe this has mostly been completed, right @lahuuki @nick-eagles?

LieberInstitute / DLPFC_snRNAseq