ImmuneDynamics / Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.
https://immunedynamics.github.io/spectre/
MIT License
56 stars 21 forks source link

Real or technical artifacts due to errors detected in UMAP following run.flowsom()? #161

Closed denvercal1234GitHub closed 3 months ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thanks for the package.

When I used run.flowsom even with meta.k=20 (automatic meta.k resolved 11 clusters), the cluster UMAP doesn't look "right" -- they clumped together on 1 side of the UMAP, and there two tiny clusters that are so separated from everyone.

Do you think these 2 clusters (6 and 20) are simply artifacts? If so, would you mind suggesting some ways to assess that?

Thank you for your help!

Screenshot 2023-04-27 at 12 07 09

Related to #154.

tomashhurst commented 1 year ago

Hi @denvercal1234GitHub hmm I get this on occasion -- the issue is to do with the UMAP calculations, which should be independent from FlowSOM clustering (i.e., even if you change the FlowSOM clustering, you will still get the same UMAP arrangement).

You might find that those cells on the far right are some kind of cells that are stacked on the maximum value across multiple channels, perhaps some kind of antibody aggregate bound to some cells or something similar. It would be worth doing some plotting to find out. If it is something technical like this then you could filter out those cells prior to analysis?

Tom

denvercal1234GitHub commented 1 year ago

thanks @tomashhurst ! This is very useful. Would you mind elaborating a bit more on which plotting would you be doing to investigate whether these are cells of technical artifact? These 2 populations are actually what we expect to detect in our data based on RNAseq of the same samples.

tomashhurst commented 3 months ago

@denvercal1234GitHub sorry for the delayed reply here -- essentially an nxn plot -- so CD3 vs CD4, CD3 vs CD19, CD3 vs N etc. In each, plotting the metacluster. You could also create a heatmap to look at what markers are expressed most highly using make.pheatmap.

There will likely be some group of cells that may have ultra-high expression on a set of markers, possibly as these might be some kind of aggregate. You could also plot them on tSNE of FItSNE which should squish them into the 2D plot a bit better which might allow you to see how they look compared to the remaining cells in the dataset.

I'll close this for now, but let us know how you get on.