digitalcytometry / cytotrace2

CytoTRACE 2 is an interpretable AI method for predicting cellular potency and absolute developmental potential from scRNA-seq data.
Other
85 stars 6 forks source link

batch correct #9

Closed hvgogogo closed 5 months ago

hvgogogo commented 5 months ago

Hi Cytotrace team,

Is there a batch correction build in function in v2?

Many thanks

savagyan00 commented 5 months ago

Hi,

Thank you for your interest in CytoTRACE 2! Since the method predicts absolute rather than relative developmental potential, outputs are automatically calibrated into a consistent space suitable for comparison across datasets. If you wish to analyze multiple datasets together, simply run CytoTRACE 2 over each separately and aggregate the outputs directly. You can refer to item 4 in our FAQ section in the README for more details.

Please let us know if you have any more questions!

vgettaa commented 5 months ago

Hi savagyan00, I have a data integrated from 6 samples by harmony. The 6 samples ran cytotrace2() and plotData() separately, and I used two variables to record the values in the output of plotData() like this (not record CytoTRACE2-Relative as it is not comparable according to item 4 in FAQ):

result1<-c(result1,plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["CytoTRACE2_Score_clipped"]])
result2<-c(result2,as.character(plots[["CytoTRACE2_Potency_UMAP"]][[1]][["data"]][["CytoTRACE2_Potency"]]))

Then I ran cytotrace2() and plotData() on the integrated data, assigned the result1, result2 and the cell embeddings to the variable output by plotData(), like this:

plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["umap_1"]]<-Seuratobj_integrated@reductions[["umap.harmony"]]@cell.embeddings[,1]
plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["umap_2"]]<-Seuratobj_integrated@reductions[["umap.harmony"]]@cell.embeddings[,2]
plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["CytoTRACE2_Score_clipped"]]<-result1

Is it correct to use CytoTRACE 2 on integrated data like this? Thanks in advance!

savagyan00 commented 5 months ago

Hi, thanks for using CytoTRACE 2 and for reaching out!

Our general recommendation is to run CytoTRACE 2 separately for each dataset without integrating them (you can refer to item 4 in our FAQ section for more details). From your message, it sounds like you’ve already done this and now want to visualize the results on an integrated UMAP embedding.

To proceed, you should first gather the individual dataset predictions from the output of the cytotrace2() function (i.e., cytotrace2_result$CytoTRACE2_Score). You can then merge these scores into a single dataframe, and this merged set of scores can be plotted on top of your Harmony-integrated UMAP embedding for a comprehensive visual representation.

It's important to note that Harmony only adjusts the cell embeddings and not the underlying expression data. Therefore, when you use the plotData() function with your integrated data object, it will internally generate a UMAP embedding without batch correction applied to the gene expression data.

Let us know if you need help plotting the CytoTRACE 2 predictions on your integrated UMAP embeddings, and we'll be glad to assist!

vgettaa commented 5 months ago

Hi @savagyan00, Thanks for your reply! you mean I can run cytotrace2() separately and merge the scores (cytotrace2_result$CytoTRACE2_Score) into a single dataframe, and use the Harmony-integrated UMAP embedding and the merged scores to visualize according to the potency categories (value range from 0 to 1 mentioned in CytoTRACE 2 outputs - CytoTRACE 2 cell potency predictions https://github.com/digitalcytometry/cytotrace2?tab=readme-ov-file#input-files)? I think the codes I used above may be similar to you, as I merged plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["CytoTRACE2_Score_clipped"]] and visualized with the merged vector and Harmony-integrated UMAP embedding. If I am right, then plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["CytoTRACE2_Score_clipped"]] is transformed by cytotrace2_result$CytoTRACE2_Score (like monotonic transformation)?

savagyan00 commented 5 months ago

Hi,

Yes, that's right. The clipped values are an internal adjustment of the CytoTRACE 2 Score values to get the colors right on the plotData function output CytoTRACE2_UMAP. You can get the unclipped values from cytotrace2() function output, without the need to run plotData; however, if you want to replicate the CytoTRACE 2 plot style exactly, then yes, you can use the clipped values and the code chunk of generating potency_score_umap from plotData function.

Let us know if we can help with anything else!

vgettaa commented 5 months ago

Thanks! I think I can add the plot to my result

shihsama commented 1 month ago

Hi @savagyan00, Thanks for your reply! you mean I can run cytotrace2() separately and merge the scores (cytotrace2_result$CytoTRACE2_Score) into a single dataframe, and use the Harmony-integrated UMAP embedding and the merged scores to visualize according to the potency categories (value range from 0 to 1 mentioned in CytoTRACE 2 outputs - CytoTRACE 2 cell potency predictions https://github.com/digitalcytometry/cytotrace2?tab=readme-ov-file#input-files)? I think the codes I used above may be similar to you, as I merged plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["CytoTRACE2_Score_clipped"]] and visualized with the merged vector and Harmony-integrated UMAP embedding. If I am right, then plots[["CytoTRACE2_UMAP"]][[1]][["data"]][["CytoTRACE2_Score_clipped"]] is transformed by cytotrace2_result$CytoTRACE2_Score (like monotonic transformation)?

Hi,savagyan! I’m working on integrated multiple datasets, too. Regarding the vgettaa workflow, I have also considered doing so by aggragating CytoTRACE2_Scores , but I'm concerned that the scores respectively generated by each dataset may not be comparable taken together. Additionally, we noticed that the orders of cell subtypes in the box plots of these datasets are slightly different, which has caused some confusion in our judgenment. How should we resolve this? Or say, we need only confirm the start of development, and then leave the rest to those trajectory algorithms?