Open FerdinandoPucci opened 3 years ago
Hello,
First you can try this before re-launching the clustering function:
Feel free to contact me again if it doesn't work.
Thanks for using SingleCellSignalR!
SCA
Thanks SCA!
I removed the NAs with
ND_merged[is.na(ND_merged)] <- 0
and there are no zero-filled lines
> sum(apply(ND_merged_norm, 1, sum)==0)
[1] 0
data_prepare() generated ND_merged_norm from ND_merged.
However:
> ND_clusters <- clustering(ND_merged_norm, n=10)
Estimating the number of clusters
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: vector memory exhausted (limit reached?)
In addition: Warning messages:
1: In for (i in 1L:d2) { : closing unused connection 5 (<-localhost:11977)
2: In for (i in 1L:d2) { : closing unused connection 4 (<-localhost:11977)
3: In for (i in 1L:d2) { : closing unused connection 3 (<-localhost:11977)
Even if:
> mem.maxVSize()
[1] 131072
Is it normal that clustering() requires so much RAM/swap? Thanks
Hi,
The clustering() function uses SIMLR to estimate the number of clusters and to clusterize the cells. It is a very performant package but it is also very greedy for large datasets (over 5000 cells).
For your dataset and on your system you can try to set the n.cluster
argument to a random value in order to skip the Estimating the number of clusters
part and set the method
argument to "kmeans".
This should work and produce a 2D t-SNE map on wich you can visualize your data and estimate yourself the number of clusters. Then re-run the analysis setting n.cluster
to the number you estimated.
It is a bit tedious but it sould work :).
Thanks again for using SingleCellSignalR!
SCA
Thank you so much, it worked. I estimated the number of clusters with Loupe browser.
> ND_clusters <- clustering(ND_merged, n.cluster=15, method="kmeans")
15 clusters detected
cluster 1 -> 220 cells
cluster 2 -> 398 cells
cluster 3 -> 220 cells
cluster 4 -> 43 cells
cluster 5 -> 100 cells
cluster 6 -> 1914 cells
cluster 7 -> 2694 cells
cluster 8 -> 2 cells
cluster 9 -> 2586 cells
cluster 10 -> 188 cells
cluster 11 -> 2419 cells
cluster 12 -> 239 cells
cluster 13 -> 9 cells
cluster 14 -> 521 cells
cluster 15 -> 83 cells
Warning message:
Quick-TRANSfer stage steps exceeded maximum (= 581800)
However, the cell_signaling() function does not find any cellular interaction. I would think this is quite unlikely as all these cells come from the same organ (lymph node). I am wondering if the merging with bulk seq data is what causes that. I have 2 "wet lab" clusters (purified cell populations sequenced in bulk) that I need to add to the scRNA data. This is because it is known that 10x and other scRNA seq procedures miss the more "delicate" cell types (they get destroyed in the GEMM phase) such as macrophages/dendritic cells, senescent cells, ... But bulk RNA seq values are much higher than scRNA seq:
> range(Bulk.ND)
[1] 0 2108283
> range(ND_matrix_norm) #scRNA dataset
[1] 0.00000 9.98179
Maybe I should use the Zscore or similar?
Thank you
Hi,
I tend to think that merging bulk and single cell RNAseq data is not a good idea, you should analyze both datasets separately (see https://www.thno.org/v10p4383 for a method using bulk RNAseq). However It is probably not the reason why you don't see any interaction.
By default the cell_signaling() function computes only the "pure" paracrine interactions (meaning that the corresponding receptor is not expressed by the cell that expresses the ligand, see supplementary figures in the paper for more details), it usually happens if the cell types are close and it seems to be your case.
You can see this if you set the int.type
argument to "autocrine", it should return a lot of interactions.
If you're interested (as I think you are) in the communication between the different cell types you can try to play with the tol
argument (see details). For example if you set tol=0.05
you allow 5% of the cells expressing the ligand to also express the receptor.
Kepp me posted if it solves your problem.
Hope this helps.
SCA
It worked! Thank you! Very few interactions with tol=0.05, I will try increase it.
However, it does not consider the 2 extra clusters (bulk RNA seq data of purified cell subsets) I manually added to the list generated by clustering().
> ND_clusters_copy$numbers
[1] 998 294 355 821 1081 935 781 933 10 824 1649 303 443 307 1900 1 1
It does look for DGE for clusters 16 and 17:
...
No such file as table_dge_cluster 17.txt in the cluster-analysis folder
...
I guess not finding the dge table is normal as it happend also in the vignette on bioconductor.
But:
...
0 No significant interaction found from cluster 1 to cluster 15
0 No significant interaction found from cluster 2 to cluster 1
...
A bit more info:
>ND_signal <- cell_signaling(data = ND_merged, genes = rownames(ND_merged), cluster = ND_clusters_copy$cluster, write = FALSE)
> nrow(ND_merged)
[1] 21532
> length(rownames(ND_merged))
[1] 21532
> ncol(ND_merged)
[1] 11636
> length(ND_clusters_copy$cluster)
[1] 11636
> nrow(ND_clusters_copy$'t-SNE')
[1] 11636
> length(ND_clusters_copy$numbers)
[1] 17
> ND_clusters_copy$cluster[(length(ND_clusters_copy$cluster)-5):length(ND_clusters_copy$cluster)]
V11924 V11925 V11926 V11927 SSM MSM
4 15 7 11 16 17
I hope I edited the list generated by clustering() in the right way. The number of columns of data (11636) matches the length of the cluster vector, as does the number of rows with rownames, as specified in the documentation. Maybe I do not have official HUGO gene symbols?
Thanks for helping!
Increasing tol
still does not detect enough interactions, any suggestion? Thanks
Hi,
You can check if the gene names you have match the gene names in LRdb as it is accessible when the package is loaded (just type LRdb
).
You should also try to set method = autocrine
and see if you have a lot of interactions. Depending on this I can give you 2 explanations:
Keep me posted on the result you get with method = autocrine
.
Tcheers,
SCA
Clustering stops after several hours of computation with that error. The command was:
Thanks for any advice