projection plot and recalculated embeddings do not seem quite aligned

gianfilippo commented 6 months ago

Hi,

I have TILs human scRNAseq data (12 samples) stored as Seurat (V4) object list, 6 samples "Pre" and 6 sample "Post" groups I projected human scRNAseq onto CD8T_human_ref_v1.rds and CD4T_human_ref_v1.rds (separately). I merged the projected object. I then made a UMAP of the projection results using something like below s <- subset(query.projected.merged.cd8, subset = Status == "Pre") p <- plot.projection(ref.cd8, query = s, linesize = 0.5, pointsize = 0.2) The resulting plot is attached as CD4_Post.pdf

I then split the merged object into a list of 2 object, "Pre" and "Post" and apply recalculate.embeddings (using the same ref). The resulting plot(s) (attached as ) do not seem quite aligned, especially the "Post" set (attached), but I may be wrong and would like your input.

My understanding is that in my samples I have cell states that are not well represented in the ref I used.

Also, can you please explain the normalized silhouette score in compute_silhouette ? what is it normalized by ? What is a reasonable normalized score ?

Thanks CD4_post.pdf CD4_Post_recalculatedUMAP.pdf

gianfilippo commented 4 months ago

Hi,

can you please comment on this ?

Thanks

mass-a commented 4 months ago

Hi Gianfilippo, I'm sorry your query slipped under the radar.

I agree that probably something went wrong with the projection. I guess you have been following this case study? Just to clarify, did you merge the results of the two projections onto the two different maps? If that's the case, that may have an unpredictable behavior. I would keep the two analyses (CD4 and CD8) fully separate. Otherwise I don't see any obvious problem with your analysis. How large is your query dataset? There seem to be a lot of points from the query in your "recalculated" UMAP, perhaps even more than the reference?

As for the silhouette coefficient, it is normalized by the silhouette coefficient of the reference. For a given cell state in the query, we calculate the silhouette coefficient of cells in this cell state against the remaining cells (reported as "Silhouette"); then this number is divided by the silhouette coefficient for the same cell state but calculated only on the cells of the reference map (reported as "Silhouette.norm"). This gives an indication of whether cells of a predicted cell state distribute similarly to those of the reference, or if they are more spread out in the combined space. I would say that roughly a normalized silhouette below 0.3-0.5 or so should make you suspect that a cell state was poorly predicted or that it's a new cell state not represented in the reference. Does it make sense?

Best -massimo

gianfilippo commented 4 months ago

Hi, I keep he CD4 and CD8 analysis separate. I have 6 individuals and 12 samples, i.e. 2 states eac individual, say "pre" and "post". Each CD4 and CD8 is stored as Seurat list. I run make.projection and cellstate.predict on each list. I then merge the samples in each list and end up with two merged obj. I resplit the each obj according to status and run recalculate.embeddings on each obj (i.e.CD4.pre, CD4.post, CD8.pre, CD8.post) using umap.method="umap", resol=1

From there I generate the plots

what do you think ?

Thanks

gianfilippo commented 4 months ago

I should add that if I use plot.projection with the original ref and the specific subset (e.g. CD8.pre, CD8.post, etc) as query, the plot looks reasonable. In other words, in the original ref space, the plot looks ok. C4.post.pdf

In the combined space (previous plots), it does not.

mass-a commented 3 months ago

By your description the analysis seems reasonable, so the devil must be in the details. Perhaps some assays are not correctly set, or some parameters are off. Would you be able to share the full code and perhaps the source data? if it's sensitive, you can also send it privately. Best -massimo

gianfilippo commented 2 months ago

Hi, thanks for your reply and sorry about the delay. I will ask about sharing the data and get back to you

gianfilippo commented 2 months ago

Hi, how can I send you the data ?

mass-a commented 2 months ago

Hi, you can contact me by email

carmonalab / ProjecTILs

projection plot and recalculated embeddings do not seem quite aligned #87