HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
160 stars 32 forks source link

quick question about filtering of genes #100

Closed AnjaliC4 closed 2 years ago

AnjaliC4 commented 2 years ago

Hi, I had a simple question: In pbDS function -> line 156: if (filter %in% c("genes", "both") & max(assay(y, k)) > 100) can you please explain what is the purpose of setting the check of max counts > 100 for filtering genes with filterbyexp. Curious because I didn't find this criteria in edgeR/limma manual. I am sure you guys have set this for a good reason -just would like to know your reasoning for clarification because for clusters where counts are less than 100 - this won't allow filterbyexp.

Thanks.

HelenaLC commented 2 years ago

Huh, I totally get why this is confusing / seems arbitrary. There is really not "magic" here, rather, this is a hacky workaround for an issue I encountered during development:

Aggregation might use different summary statistics (say, sum or mean or median) and different assay data (say, counts or expression-like values). Meanwhile, edgeR's filterByExpr() is designed for count-like data... So the > 100 check is hoping to check "Do these look like counts?" (Well, sum of single-cell counts, really) Before having this in place, filterByExpr() would remove everything when aggregateData() had been called with, for example, mean of logcounts... Hope that makes sense!