Choosing p_cut from volcano plot

BradBalderson / Cytocipher

Analysis methods for analysing single cell RNA-seq data; particularly with the goal of checking if tentative clusters of cells are significantly different to one another in terms of their gene expression.

GNU General Public License v3.0

29 stars 6 forks source link

Choosing p_cut from volcano plot #2

Closed pakiessling closed 1 year ago

pakiessling commented 1 year ago

Thanks for the tool, looks very promising.

I am trying it out for my data and the volcano plot looks like this for p=0.045:

grafik

I got more cells with - log-FC than in your Pancreas example.

Does this suggest I should decrease my p_cut (by some orders of magnitude) to remove most of the negativ log-FC?

p_cut = 0.00045

grafik

BradBalderson commented 1 year ago

Hi @pakiessling,

The volcano plot is informative to an extent, to atleast make sure how log-FC is reflecting cluster pair significance. But I think the enrichment heatmap for the unmerged versus merged data is more informative.

I am wondering what this looks like? E.g. code below:

cc.pl.enrich_heatmap(data, 'leiden_merged', figsize=(4,4), scale_cols=True)

pakiessling commented 1 year ago

p_cut=0.00045 , 15 markers

Before merge: grafik

After merge: grafik

grafik

I think I actually like what it did, just have to look more into the huge cluster 14

BradBalderson commented 1 year ago

I think n_markers is way too high for code-scoring. Please try with 5-8. I think the above is dramatically under-clustered with those parameters.

pakiessling commented 1 year ago

Thank you, I started with the default amount of marker genes, but it kept forcing me to increase the amount of markers becuase of identical patterns in clusters

BradBalderson commented 1 year ago

Ah right, sorry about this, that warning is actually not really an issue, and can be safely ignored.

Another user mentioned this in the first issue, and I have correct it on the github main branch, but haven't yet updated the pip version.

Could we try with the lower number of marker genes anyhow?

pakiessling commented 1 year ago

Looks much more reasonable with lower markers and also cleaning the data more beforehand