SCA-IRCM / SingleCellSignalR_v1

R package
26 stars 17 forks source link

Long computation time for cluster_analysis #12

Open ccruizm opened 4 years ago

ccruizm commented 4 years ago

Good day,

Thanks for developing this great tool. I have tried to apply it to my dataset. However, I have detected a long running time to generate the DE analysis using cluster_analysis. I have pre-processed all my data using Seurat. I did follow your recommendation on Supplementary Box 2. However, if I run the cell_signaling command without prior info on the DE genes per cluster, I won't have the specific interactions per cluster.

My dataset is about 35K cells and processing the data on an HPC with 64 cores and 360GB memory, it took 24h to generate the txt file of DE genes for only one cluster. It seems excessively long. There is a way to optimize it? Can I run the DE tests on Seurat and use those for the analysis by cell_signaling? there is a way I can speed up the analysis?

Thanks in advance

SCA-IRCM commented 4 years ago

Hi, You can use the DE found with Seurat. The important points are: Create a cluster-analysis folder. Write a text file named table_dge_cluster X.txt (note that cluster X can be replaced by the cluster name, eg "T-cells", but it must be specified in the cell_signaling function under the c.names argument. This text file must have at least two columns named genes and logFC.

Here is an example with the 10x pbmc dataset used in the demo (https://github.com/SCA-IRCM/Demo). After pbmc <- FindClusters(pbmc, resolution = 0.1) You can do: pbmc.markers <- FindAllMarkers(pbmc, only.pos = T, min.pct = 0.25, logfc.threshold = 0.25) Then from this table you can do the following for each cluster (here cluster i):

  j <- as.numeric(i)+1
  tmp <- pbmc.markers[pbmc.markers$cluster==i,]
  tmp <- cbind(tmp$gene,tmp$avg_logFC)
  colnames(tmp) <- c("genes","logFC")
  fwrite(data.frame(tmp),paste0("../Demo/cluster-analysis/table_dge_cluster ",j,".txt"),sep="\t")

Note that in Seurat the clusters are numbered from 0 and in SingleCellSignalR they're numbered from 1 (j <- as.numeric(i)+1).

Then the function will get the files and annotate your interactions as specific if both the ligand and the receptor are differentially expressed in their respective clusters.

I hope you can make this work, send me a message with more details if it doesn't.

Thanks for using SingleCellSignalR.

SCA