dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
266 stars 98 forks source link

ConvertSingleR2Browser #19

Closed hfberg closed 5 years ago

hfberg commented 5 years ago

Having troubles converting the SingleR object to be able to upload it on the browser. This is raw DGE data that I'v downloaded. it has worked in Seurat previously, but never in SingleR. I've tried a couple of different ways to read in the data, but the same issue occurs. what have I missed?

> > raw_DGE_TEST <- read.table( file ="/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt.gz", header=TRUE)
> > 
> > singler6 <- CreateSinglerSeuratObject(counts = raw_DGE_TEST, project.name = 'Kidney2', species = "Mouse", fine.tune = F)

> [1] "Kidney2"
> [1] "Reading single-cell data..."
> [1] "Create Seurat object..."
> Performing log-normalization
> 0%   10   20   30   40   50   60   70   80   90   100%
> [----|----|----|----|----|----|----|----|----|----|
> **************************************************|
> Calculating gene means
> 0%   10   20   30   40   50   60   70   80   90   100%
> [----|----|----|----|----|----|----|----|----|----|
> **************************************************|
> Calculating gene variance to mean ratios
> 0%   10   20   30   40   50   60   70   80   90   100%
> [----|----|----|----|----|----|----|----|----|----|
> **************************************************|
> Regressing out: nUMI
>   |===================================================================================================================================================| 100%
> Time Elapsed:  33.442033290863 secs
> Scaling data matrix
>   |===================================================================================================================================================| 100%
> [1] "Creat SingleR object..."
> [1] "Dimensions of counts data: 19697x6202"
> [1] "Annotating data with Immgen..."
> [1] "Variable genes method: de"
> [1] "Number of DE genes:3389"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:3389"
> [1] "Number of clusters: 13"
> [1] "Annotating data with Immgen (Main types)..."
> [1] "Number of DE genes:2188"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:2188"
> [1] "Number of clusters: 13"
> [1] "Annotating data with Mouse-RNAseq..."
> [1] "Variable genes method: de"
> [1] "Number of DE genes:3555"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:3555"
> [1] "Number of clusters: 13"
> [1] "Annotating data with Mouse-RNAseq (Main types)..."
> [1] "Number of DE genes:2796"
> [1] "Number of cells: 6202"
> [1] "Number of DE genes:2796"
> [1] "Number of clusters: 13"

> > singler.new = convertSingleR2Browser(singler6)
> Error in names(x) <- value : 
>   'names' attribute [4] must be the same length as the vector [0]

> > traceback()
> 2: `colnames<-`(`*tmp*`, value = c(ref.names, paste0(ref.names, 
>        ".main")))
> 1: convertSingleR2Browser(singler6)

Loading the data directly doesn't work at all since it doesn't recognize the column names I guess (?)

> singler7 <- CreateSinglerSeuratObject(counts = "/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt", project.name = '7CP1', species = "Mouse", fine.tune = F)
[1] "7CP1"
[1] "Reading single-cell data..."
Error in make.unique(colnames(counts)) : 
  'names' must be a character vector

> traceback()
3: make.unique(colnames(counts))
2: ReadSingleCellData(counts, annot)
1: CreateSinglerSeuratObject(counts = "/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt", 
       project.name = "7CP1", species = "Mouse", fine.tune = F)
dviraran commented 5 years ago

Hi,

Thanks. This is very helpful for fixing those small issues.

The first issue is because you did fine.tune=F, and the convert function wasn't ready for that... I added a fix for that. Should work now.

The second issue is caused because the separator in this file is space and not tab, and this function requires it to be tab-delimited. I think its fine that way. The documentation does states it must be a tab-delimited file, and using spaces is not very common.

Best, Dvir

hfberg commented 5 years ago

Perfect, thank you! :)

hfberg commented 5 years ago

I re-installed SingleR to get your updates and tried again. This is the result now. I hope I don't missed anything obvious here, still quite new to programming. As you said, it's working when I set fine.tune = T but I wanted to try with a lighter version first for testing. This is the new output with fine.tune = F and your update:

> library(SingleR)
> raw_DGE_TEST <- read.table( file ="/home/proj/TEST/kidney2/GSM2906426_Kidney2_dge.txt", header=TRUE) 
> singler6 <- CreateSinglerSeuratObject(counts = raw_DGE_TEST, project.name = 'CP1',  species = "Mouse", fine.tune = F)
[1] "CP1"
[1] "Reading single-cell data..."
[1] "Create Seurat object..."
Performing log-normalization
...
(same output as in my last post)
...
[1] "Number of DE genes:2796"
[1] "Number of clusters: 13"
> 
> singler.new = convertSingleR2Browser(singler6)
Error in initialize(value, ...) : object 'labels1' not found
> traceback()
4: initialize(value, ...)
3: initialize(value, ...)
2: new("SingleR", project.name = singler$meta.data$project.name, 
       xy = singler$meta.data$xy, labels = labels, labels.NFT = labels1, 
       labels.clusters = labels.clusters, labels.clusters.NFT = labels.clusters1, 
       scores = scores, clusters = clusters, ident = ident, other = data.frame(singler$signatures), 
       expr = singler$seurat@data, meta.data = c(Citation = singler$singler[[1]]$about$Citation, 
           Organism = singler$singler[[1]]$about$Organism, Technology = singler$singler[[1]]$about$Technology))
1: convertSingleR2Browser(singler6)
dviraran commented 5 years ago

I made a small fix. Can you test it now?

-- Dvir

hfberg commented 5 years ago

Sorry same thing. Am I updating the way you intended?

> devtools::install_github("dviraran/SingleR")
Downloading GitHub repo dviraran/SingleR@master
Skipping 2 packages not available: GSEABase, GSVA
✔  checking for file ‘/tmp/RtmpH5RCyJ/remotesd9c68096c01/dviraran-SingleR-293653c/DESCRIPTION’ ...
─  preparing ‘SingleR’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  looking to see if a ‘data/datalist’ file should be added
─  building ‘SingleR_0.2.2.tar.gz’ (14.1s)

Installing package into ‘/home/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
* installing *source* package ‘SingleR’ ...
** R
** data
*** moving datasets to lazyload DB
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (SingleR)
> singler.new = convertSingleR2Browser(singler6)
Error in new("SingleR", project.name = singler$meta.data$project.name,  : 
  object 'labels1' not found
> library("SingleR")
> raw_DGE <- read.table(file = "/home/proj/data/DGE/CP1_DGE.txt", header = TRUE, row.names = 1, colClasses =c("character", rep("numeric", 10000)))
> singler6 <- CreateSinglerSeuratObject(counts = raw_DGE, project.name = 'CP1',  species = "Mouse", fine.tune = F)
[1] "CP1"
[1] "Reading single-cell data..."
[1] "Create Seurat object..."
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene means
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variance to mean ratios
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Regressing out: nUMI
  |=================================================================================================================================================| 100%
Time Elapsed:  28.1569139957428 secs
Scaling data matrix
  |=================================================================================================================================================| 100%
[1] "Creat SingleR object..."
[1] "Dimensions of counts data: 18264x9961"
[1] "Annotating data with Immgen..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3102"
[1] "Number of cells: 9961"
[1] "Number of DE genes:3102"
[1] "Number of clusters: 10"
[1] "Annotating data with Immgen (Main types)..."
[1] "Number of DE genes:1995"
[1] "Number of cells: 9961"
[1] "Number of DE genes:1995"
[1] "Number of clusters: 10"
[1] "Annotating data with Mouse-RNAseq..."
[1] "Variable genes method: de"
[1] "Number of DE genes:3122"
[1] "Number of cells: 9961"
[1] "Number of DE genes:3122"
[1] "Number of clusters: 10"
[1] "Annotating data with Mouse-RNAseq (Main types)..."
[1] "Number of DE genes:2440"
[1] "Number of cells: 9961"
[1] "Number of DE genes:2440"
[1] "Number of clusters: 10"
> singler.new = convertSingleR2Browser(singler6)
Error in new("SingleR", project.name = singler$meta.data$project.name,  : 
  object 'labels1' not found
> traceback()
4: initialize(value, ...)
3: initialize(value, ...)
2: new("SingleR", project.name = singler$meta.data$project.name, 
       xy = singler$meta.data$xy, labels = labels, labels.NFT = labels1, 
       labels.clusters = labels.clusters, labels.clusters.NFT = labels.clusters1, 
       scores = scores, clusters = clusters, ident = ident, other = data.frame(singler$signatures), 
       expr = singler$seurat@data, meta.data = c(Citation = singler$singler[[1]]$about$Citation, 
           Organism = singler$singler[[1]]$about$Organism, Technology = singler$singler[[1]]$about$Technology))
1: convertSingleR2Browser(singler6)
dviraran commented 5 years ago

This is very weird. I cannot replicate this error. Can you paste the output of just writing -

convertSingleR2Browser

just want to see that you are using the updated version of that function.

hfberg commented 5 years ago
> convertSingleR2Browser
function (singler, use.singler.cluster.annot = T) 
{
    ref.names = unlist(lapply(singler$singler, FUN = function(x) x$about$RefData))
    cell.names = rownames(singler$singler[[1]]$SingleR.single$labels)
    labels = as.data.frame(sapply(singler$singler, FUN = function(x) x$SingleR.single$labels))
    if (!is.null(singler$singler[[1]]$SingleR.single.main)) {
        labels.main = as.data.frame(sapply(singler$singler, FUN = function(x) x$SingleR.single.main$labels))
        labels = cbind(labels, labels.main)
        colnames(labels) = c(ref.names, paste0(ref.names, ".main"))
    }
    else {
        colnames(labels) = c(ref.names)
    }
    rownames(labels) = cell.names
    if (!is.null(singler$singler[[1]]$SingleR.single$labels1)) {
        labels1 = as.data.frame(sapply(singler$singler, FUN = function(x) x$SingleR.single$labels1))
        if (!is.null(singler$singler[[1]]$SingleR.single.main)) {
            labels1.main = as.data.frame(sapply(singler$singler, 
                FUN = function(x) x$SingleR.single.main$labels1))
            labels1 = cbind(labels1, labels1.main)
            colnames(labels1) = c(ref.names, paste0(ref.names, 
                ".main"))
        }
        else {
            colnames(labels1) = c(ref.names)
        }
        rownames(labels1) = cell.names
    }
    labels.clusters = data.frame()
    labels.clusters1 = data.frame()
    if (use.singler.cluster.annot == T) {
        if (length(levels(singler$meta.data$clusters)) > 1) {
            if (!is.null(singler$singler[[1]]$SingleR.clusters)) {
                labels.clusters = as.data.frame(sapply(singler$singler, 
                  FUN = function(x) x$SingleR.clusters$labels))
                if (!is.null(singler$singler[[1]]$SingleR.clusters.main)) {
                  labels.clusters.main = as.data.frame(sapply(singler$singler, 
                    FUN = function(x) x$SingleR.clusters.main$labels))
                  labels.clusters = cbind(labels.clusters, labels.clusters.main)
                  colnames(labels.clusters) = c(ref.names, paste0(ref.names, 
                    ".main"))
                }
                else {
                  colnames(labels.clusters) = c(ref.names)
                }
                rownames(labels.clusters) = levels(singler$meta.data$clusters)
            }
            if (!is.null(singler$singler[[1]]$SingleR.cluster$labels1)) {
                if (!is.null(singler$singler[[1]]$SingleR.clusters)) {
                  labels.clusters1 = as.data.frame(sapply(singler$singler, 
                    FUN = function(x) x$SingleR.clusters$labels1))
                  if (!is.null(singler$singler[[1]]$SingleR.clusters.main)) {
                    labels.clusters.main = as.data.frame(sapply(singler$singler, 
                      FUN = function(x) x$SingleR.clusters.main$labels1))
                    labels.clusters1 = cbind(labels.clusters1, 
                      labels.clusters.main)
                    colnames(labels.clusters1) = c(ref.names, 
                      paste0(ref.names, ".main"))
                  }
                  else {
                    colnames(labels.clusters1) = c(ref.names)
                  }
                  rownames(labels.clusters1) = levels(singler$meta.data$clusters)
                }
            }
        }
    }
    scores = lapply(singler$singler, FUN = function(x) x$SingleR.single$scores)
    if (!is.null(singler$singler[[1]]$SingleR.single.main)) {
        scores.main = lapply(singler$singler, FUN = function(x) x$SingleR.single.main$scores)
        scores = c(scores, scores.main)
        names(scores) = c(ref.names, paste0(ref.names, ".main"))
    }
    else {
        names(scores) = c(ref.names)
    }
    clusters = data.frame(clusters = singler$meta.data$clusters)
    rownames(clusters) = cell.names
    ident = data.frame(orig.ident = singler$meta.data$orig.ident)
    rownames(ident) = cell.names
    singler.small = new("SingleR", project.name = singler$meta.data$project.name, 
        xy = singler$meta.data$xy, labels = labels, labels.NFT = labels1, 
        labels.clusters = labels.clusters, labels.clusters.NFT = labels.clusters1, 
        scores = scores, clusters = clusters, ident = ident, 
        other = data.frame(singler$signatures), expr = singler$seurat@data, 
        meta.data = c(Citation = singler$singler[[1]]$about$Citation, 
            Organism = singler$singler[[1]]$about$Organism, Technology = singler$singler[[1]]$about$Technology))
    singler.small
}
<bytecode: 0x28d23410>
<environment: namespace:SingleR>
dviraran commented 5 years ago

yeah, this is still an older version. The new version contains the line: labels1 = data.frame()

https://github.com/dviraran/SingleR/blob/b96534dffc8469a63a197950fae5134589e6c0ae/R/SingleR.Object.R#L56

Try updating SingleR again...

hfberg commented 5 years ago

Thank you! I updated the function manually, problem solved.