Download cluster CSV - Githubissues

hfberg commented 5 years ago

hi! I would like to download the CSV from the clusterings (in the "cluster" tab) for my dataset. But when I press the "Download CSV" button i only get an empty speradsheet. What am I doing wrong? :)

Screenshot from 2019-03-18 09-35-29

dviraran commented 5 years ago

I guess the code is built that it only works after you click on the recluster button.

Note that you might just want to use the SingleR.Cluster function.

hfberg commented 5 years ago

Thank you for your quick reply! What do you mean with "recluster"? I've tried to download the CSV from the "Cluster" tab in SIngleRBrowser both before and after pressing the "cluster" button with the same results.

I also tried with the SingleR.Cluster function, but it throws the following error:

> `singler` <- > SingleR.Cluster(SingleR, num.clust =10, normalize_rows = F, normalize_cols = T)
Error in hclust(dist(scores, method = "euclidean"), method = "ward.D2") : 
  must have n >= 2 objects to cluster
> traceback()
3: stop("must have n >= 2 objects to cluster")
2: hclust(dist(scores, method = "euclidean"), method = "ward.D2")
1: SingleR.Cluster(singler, num.clusts = 10)

I looked into your code for SingleR.Cluster and found that you refer to a list called SingleR$scores. In my SingleR object there is no such list at that level.

Your function:

function (SingleR, num.clusts = 10, normalize_rows = F, normalize_cols = T) 
{
    if (normalize_rows == T) {
        SingleR$scores = scale(SingleR$scores)
    }
    if (normalize_cols == T) {
        scores = t(scale(t(SingleR$scores^3)))
    }
    else {
        scores = SingleR$scores
    }
    hc = hclust(dist(scores, method = "euclidean"), method = "ward.D2")
    cl = cutree(hc, k = num.clusts)
    list(hc = hc, cl = factor(cl))
}

However I found scores in my SingleRObject at the following level:

SingleR[["singler"]][[1]][["SingleR.single"]][["scores"]]

By inserting that score in a test version of your function I managed to create clusters with the default settings at least. But I'm wondering, do we refer to the same type of scores you and I? I don't want to use the wrong scores for clustering. :)

Also, another question: When is it appropriate to normalize by rows (single cells) or columns (cell type)?

dviraran commented 5 years ago

Sorry that it isn't clear. The basic SingleR object is the SingleR.single/cluster. Everywhere in the package when it says SingleR you should input the SingleR.single/cluster/main object. The upper levels are what is created by the wrapper functions.

Also, another question: When is it appropriate to normalize by rows (single cells) or columns (cell type)? The default is to normalize columns - allows to see easily which cell type has the highest score for a given single-cell, but this may be deceiving, as the correlations are very low. Thus it is always a good idea to also look at it without any normalization. Normalizing by row allows seeing which single cells are most correlated with a given cell type. This might be useful to identify cell types that are not available in the reference.

hfberg commented 5 years ago

Ok, that explains a lot, thank you! :)

dviraran / SingleR

Download cluster CSV #46