JSB-UCLA / scDEED

Single-cell dubious embedding detector (scDED): a statistical method for detecting dubious non-linear embeddings
MIT License
31 stars 4 forks source link

How can I change the number of cells selected? #4

Closed itarampoulous closed 6 months ago

itarampoulous commented 6 months ago

I am trying to use your package on a large dataset of around 47k cells and scDEED seems to be sampling 4000 cells. The problem is this doesn't cover all of my clusters and some of the clusters I'm interested aren't selected for the UMAP optimization. How can I change the number of cells selected?

Thanks, Isaak

clee700 commented 6 months ago

Hello Isaak, Thanks for your interest in scDEED. Could you please check:

  1. Do you have the latest version of the package? We did a large update ~4 days ago, and in that update, we removed the internal downsampling option. If you do have the latest version, the object returned from the scDEED function should be a list of 2 dataframes.

  2. If you do have the latest version, then I think maybe there is a misunderstanding about the output of scDEED. scDEED operates on the entire dataset and is not cluster specific. So it is entirely possible that some clusters don't have any dubious cells at all.

Please let me know if neither of these fixes the issue. Best, Christy

itarampoulous commented 6 months ago

Hi Christy,

I'm using the latest version and here is the output I get:

scDEEDoutput

What do the "unselected" cells represent?

Thanks, Isaak

clee700 commented 6 months ago

Hi Isaak, The unselected cells are intermediate. This means the score is between trustworthy and dubious cut off. They are still involved in the optimization process.

Best Christy

On Fri, Apr 12, 2024 at 1:21 PM Isaak Tarampoulous @.***> wrote:

Hi Christy,

I'm using the latest version and here is the output I get: scDEEDoutput.png (view on web) https://github.com/JSB-UCLA/scDEED/assets/68131939/70fc0402-843b-4ecc-9479-ece292a23495

What do the "unselected" cells represent?

Thanks, Isaak

— Reply to this email directly, view it on GitHub https://github.com/JSB-UCLA/scDEED/issues/4#issuecomment-2052471291, or unsubscribe https://github.com/notifications/unsubscribe-auth/A26BPGYYRRXTLQYV223SG2TY5A63FAVCNFSM6AAAAABGDPYCZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSGQ3TCMRZGE . You are receiving this because you commented.Message ID: @.***>

itarampoulous commented 6 months ago

Hi Christy,

Thank you for your prompt response. What are the cutoffs for this? It's unclear how to interpret them. Is optimization performed on the number of dubious or dubious + intermediate cells?

Thanks, Isaak

clee700 commented 6 months ago

Hi Isaak, Optimization is performed on the number of dubious cells only. The default cutoffs are set through trustworthy_cutoff = 0.95 and dubious_cutoff = 0.05. The actual cutoffs themselves will vary depending on the data, so the arguments are just for the percentile cutoffs, similar to alpha or significance level in hypothesis testing. We use the similarity scores calculated on the permuted data to represent null scores. Since cell-cell relationships are disrupted, these similarity scores represent the similarity of the pre- and post-embedding space due to chance. Dubious cells are cells with similarity scores < 5th percentile of similarity scores calculated on permuted data Trustworthy cells are cells with similarity scores > 95th percentile of similarity scores calculated on permuted data

Increasing the dubious cell cutoff will result in more dubious cells; the cutoff to be considered dubious is higher, so cells will have to have higher scores to avoid the 'dubious' classification. Decreasing the dubious cell cutoff will result in less dubious cells; the cutoff to be considered dubious is lower, so cells must score very badly (less than the dubious_cutoff percentile) to be considered dubious.

Best, Christy