ImmuneDynamics / Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.
https://immunedynamics.github.io/spectre/
MIT License
56 stars 21 forks source link

How to identify and remove "bad/outlier" FCS files objectively, before performing clustering? #148

Closed denvercal1234GitHub closed 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thanks for the workflow.

flowAI can help remove technical error by removing actual events, but not the whole FCS file (http://127.0.0.1:18637/library/flowAI/doc/flowAI.html).

In my data, by manually examining the staining profile, I identified some FCS files that had "outlier" staining pattern (across markers) and aim to remove these FCS files from the clustering analysis, but is there a way to objectively remove certain whole FCS files instead of just visually deciding?

Or, is it a best practice to only remove outlier events and not the whole file?

Thank you again for your help.

ghar1821 commented 1 year ago

Not sure I understand what you mean by removing outlier events vs removing the entire files. Do you have several outliers in one FCS file? or the entire FCS file has weird staining pattern?

If the latter, I suppose it make sense to just not include that FCS file in your analysis.

If you want to remove just the outlier cells in a given FCS files, you can load it up as data.table first, then just keep cells which expression is less than or greater than some values in certain channel? Just a simple filter on the data.table like:

cell.dat <- cell.dat[Yb176_MHCII <= 10, ]
denvercal1234GitHub commented 1 year ago

Thank you @ghar1821! A reviewer commented that if we simply remove the FCS files after manually examining the staining profile in FlowJo instead of some unbiased statistical modeling to first identify the troublesome FCS file, then we were almost Cherry-picking. What is your opinion?

And in my data, it is not the signal intensity above a certain value but it is more that the shape of certain population for these FCS files are quite different compared to the rest of the FCS files viewed for the same channels. My data are spectral flow.

Thanks again!

tomashhurst commented 1 year ago

Hi @denvercal1234GitHub ,

A reviewer commented that if we simply remove the FCS files after manually examining the staining profile in FlowJo instead of some unbiased statistical modeling to first identify the troublesome FCS file, then we were almost Cherry-picking. What is your opinion?

I think the issue here is around objectivity -- why exclude some files but not others, etc. Excluding some files because of cell quality issues (e.g. too many dead cells etc) or data quality issues (signal shifting across acquisition time) can be reasonable, but you would need some kind of objective rational for why each is excluded, such that a reviewer can assess. However, if it is not a data quality issue, then those differences may well be representing genuine biological differences, and are worth including.

Can you provide any more info?

denvercal1234GitHub commented 1 year ago

Thanks @tomashhurst. So we have about 200 wells. Each well was stained with the same backbone antibodies, but for a PE channel, each well has a different antibody. The files that have different staining patterns for non-PE markers turned out to be the wells with the highest staining intensity for PE compared to the other wells. Thus, it is likely that this too high intensity skewed the other channels, compared to the other wells.

Note that in our case, the PE-Abs for these wells did not get titrated because it was preconjugated to the plate for us (we cannot change the amount).

I hope to at least perform some sort of PCA plot to see whether these FCS would cluster together away from the rest of the hundreds of files? Do you have a script handy for this PCA checking?

Thank you so much!

An example of a file with very high PE staining and a file with not too high PE staining. You can see that the one with very high PE staining has this diagonal shape of likely technical error because PE-Ab concentration in this well might be too high

Screenshot 2023-01-31 at 15 40 20