aertslab / SCopeLoomR

R package (compatible with SCope) to create generic .loom files and extend them with other data e.g.: SCENIC regulons, Seurat clusters and markers, ...
MIT License
38 stars 15 forks source link

Issue with building a .loom from a sparseMatrix (e.g.: dgCMatrix) stored in SingleCellExperiment objects #3

Closed abhisheksinghnl closed 5 years ago

abhisheksinghnl commented 5 years ago

Hi,

I am trying to generate the loom files for my experiment and I am running into error at one of the stages. Could you please help in fixing it.

I have my sce object that looks like this

class: SingleCellExperiment 
dim: 33694 5586 
metadata(0):
assays(2): counts logcounts
rownames(33694): ENSG00000243485 ENSG00000237613 ... ENSG00000277475 ENSG00000268674
rowData names(12): id symbol ... log10_total_counts use
colnames(5586): AAACCTGAGAAGGTTT-1 AAACCTGAGCGTTCCG-1 ... TTTGTCATCGTCTGCT-1 TTTGTCATCGTTGCCT-1
colData names(32): dataset barcode ... use outlier
reducedDimNames(0):
spikeNames(1): MT

On which I ran the following code before generating the loom file as directed in tutorial.

> dgem <- counts(sce_1)
> dim(dgem)
[1] 33694  5586
> class(dgem)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> head(colnames(dgem))
[1] "AAACCTGAGAAGGTTT-1" "AAACCTGAGCGTTCCG-1" "AAACCTGAGTACGTAA-1" 
"AAACCTGGTAAACACA-1" "AAACCTGGTCCATGAT-1" "AAACCTGTCAAACAAG-1"
>cell.info <- colData(sce_1)
> head(cell.info)
DataFrame with 6 rows and 32 columns
                     dataset            barcode total_features log10_total_features total_counts log10_total_counts
                   <integer>        <character>      <integer>            <numeric>    <numeric>          <numeric>
AAACCTGAGAAGGTTT-1         1 AAACCTGAGAAGGTTT-1           1373             3.137987         3380           3.529045
AAACCTGAGCGTTCCG-1         1 AAACCTGAGCGTTCCG-1           1837             3.264346         7332           3.865282
AAACCTGAGTACGTAA-1         1 AAACCTGAGTACGTAA-1           2259             3.354108        10099           4.004321
AAACCTGGTAAACACA-1         1 AAACCTGGTAAACACA-1            802             2.904716         1833           3.263399
AAACCTGGTCCATGAT-1         1 AAACCTGGTCCATGAT-1           2601             3.415307        12184           4.085826
AAACCTGTCAAACAAG-1         1 AAACCTGTCAAACAAG-1           1967             3.294025         7352           3.866465
                   pct_counts_top_50_features pct_counts_top_100_features pct_counts_top_200_features pct_counts_top_500_features

However, I get an error when I run the next command:

> cell.info$nGene <- colSums(dgem>0)
Error in colSums(dgem > 0) : 
  'x' must be an array of at least two dimensions

Could you please let me know, how to fix this.

Thank you.

dweemx commented 5 years ago

Try:

cell.info$nGene <- Matrix::colSums(dgem>0)

If you don't have Matrix R package installed, you should install it:

install.packages("Matrix")
abhisheksinghnl commented 5 years ago

Hi,

Thank you for your reply. I installed the Matrix package and also added the Matrix in command line:

> cell.info$nGene <- Matrix::colSums(dgem>0)
> head(cell.info)
DataFrame with 6 rows and 31 columns
                     dataset            barcode total_features log10_total_features total_counts log10_total_counts
                   <integer>        <character>      <integer>            <numeric>    <numeric>          <numeric>
AAACCTGAGAAGGTTT-1         1 AAACCTGAGAAGGTTT-1           1373             3.137987         3380           3.529045
AAACCTGAGCGTTCCG-1         1 AAACCTGAGCGTTCCG-1           1837             3.264346         7332           3.865282
AAACCTGAGTACGTAA-1         1 AAACCTGAGTACGTAA-1           2259             3.354108        10099           4.004321
AAACCTGGTAAACACA-1         1 AAACCTGGTAAACACA-1            802             2.904716         1833           3.263399
AAACCTGGTCCATGAT-1         1 AAACCTGGTCCATGAT-1           2601             3.415307        12184           4.085826
AAACCTGTCAAACAAG-1         1 AAACCTGTCAAACAAG-1           1967             3.294025         7352           3.866465
                   pct_counts_top_50_features pct_counts_top_100_features pct_counts_top_200_features pct_counts_top_500_features

I don't get the error immediately now but here

> file.name <- "example.loom"
> loom <- build_loom(file.name=file.name,
+                    dgem=dgem,
+                    title="Fake expression dataset for examples",
+                    genome="Mouse", # Just for user information, not used internally
+                    default.embedding=default.tsne,
+                    default.embedding.name=default.tne.name)
[1] "Adding global attributes..."
[1] "Adding matrix..."
  |================================================================================================================================| 100%[1] "Adding column attributes..."
[1] "Adding default metrics nUMI..."
Error in colSums(dgem) : 'x' must be an array of at least two dimensions

Any suggestions to fix this here?

thank you

abhisheksinghnl commented 5 years ago

Also, the head of


> head (Matrix::colSums(dgem))
AAACCTGAGAAGGTTT-1 AAACCTGAGCGTTCCG-1 AAACCTGAGTACGTAA-1 AAACCTGGTAAACACA-1 AAACCTGGTCCATGAT-1 AAACCTGTCAAACAAG-1 
              3380               7332              10099               1833              12184               7352 
dweemx commented 5 years ago

This is a bug. I will fix this as soon as possible.

This should work as a temporary fix:

loom <- build_loom(file.name=file.name,
                   dgem=as.matrix(x = dgem),
                   title="Fake expression dataset for examples",
                   genome="Mouse", # Just for user information, not used internally
                   default.embedding=default.tsne,
                   default.embedding.name=default.tne.name)
abhisheksinghnl commented 5 years ago

Thank you, I moved bit further to stumble again. :(

> loom <- build_loom(file.name=file.name,
+                    dgem=as.matrix(x = dgem),
+                    title="Fake expression dataset for examples",
+                    genome="Mouse", # Just for user information, not used internally
+                    default.embedding=sce_1.qc,
+                    default.embedding.name=default.tne.name)
[1] "Adding global attributes..."
[1] "Adding matrix..."
  |================================================================================================================================| 100%[1] "Adding column attributes..."
[1] "Adding default metrics nUMI..."
[1] "Adding default metrics nGene..."
[1] "Adding default embedding..."
Error in as.vector(x) : no method for coercing this S4 class to a vector

sce_1.qc is tsne and looks like this

class: SingleCellExperiment 
dim: 1841 3910 
metadata(0):
assays(2): counts logcounts
rownames(1841): ENSG00000187608 ENSG00000186891 ... ENSG00000198786 ENSG00000198727
rowData names(12): id symbol ... log10_total_counts use
colnames(3910): AAACCTGAGAAGGTTT-1 AAACCTGAGCGTTCCG-1 ... TTTGTCATCGTCTGCT-1 TTTGTCATCGTTGCCT-1
colData names(32): dataset barcode ... use outlier
reducedDimNames(1): TSNE
spikeNames(1): MT
> class(sce_1.qc)
[1] "SingleCellExperiment"
attr(,"package")
[1] "SingleCellExperiment"
> typeof(sce_1.qc)
[1] "S4"

How shall I fix this one, I think it is the error from my side and not from SCopeLoomR. Could you please suggest something that will bypass this.

Thank you

dweemx commented 5 years ago
abhisheksinghnl commented 5 years ago

So, I did this:

>tsne<-reducedDim(sce_1.qc, "TSNE")[,1:2]
> head(tsne)
                        [,1]       [,2]
AAACCTGAGAAGGTTT-1 -16.17201 -24.815360
AAACCTGAGCGTTCCG-1 -18.10361  45.005593
AAACCTGAGTACGTAA-1 -15.06162  43.073052
AAACCTGGTCCATGAT-1  31.84095   8.899309
AAACCTGTCAAACAAG-1  19.09773 -16.977060
AAACCTGTCATCGGAT-1  23.72917   9.700369

Then this :

> loom <- build_loom(file.name=file.name,
+                    dgem=as.matrix(x = dgem),
+                    title="Fake expression dataset for examples",
+                    genome="Mouse", # Just for user information, not used internally
+                    default.embedding=tsne,
+                    default.embedding.name=default.tne.name)
[1] "Adding global attributes..."
[1] "Adding matrix..."
  |================================================================================================================================| 100%[1] "Adding column attributes..."
[1] "Adding default metrics nUMI..."
[1] "Adding default metrics nGene..."
[1] "Adding default embedding..."
Error in add_embedding(loom = loom, embedding = as.data.frame(default.embedding),  : 
  Some cells (1676) from expression matrix are missing in the embeddings: AAACCTGGTAAACACA-1,AAACGGGAGCCGGTAA-1,AAACGGGAGTGTCCCG-1,AAACGGGCATCACGTA-1,AAACGGGGTAACGACG-1,AAAGATGCAAGGTTTC-1,AAAGATGGTCGAAAGC-1,AAAGATGGTCGCGTGT-1,AAAGATGTCTGCGTAA-1,AAAGCAAGTGAGGCTA-1,AAAGCAAGTGCAGACA-1,AAAGCAATCGAATCCA-1,AAAGTAGAGTACGCCC-1,AAAGTAGCACCTATCC-1,AAAGTAGGTCGCATAT-1,AAATGCCCAAGCGAGT-1,AAATGCCGTCTCATCC-1,AAATGCCGTTCGCGAC-1,AAATGCCTCCACTCCA-1,AACACGTGTTATCGGT-1,AACACGTTCACATACG-1,AACCATGGTAGAAGGA-1,AACCATGGTCGAATCT-1,AACCATGTCTCTGAGA-1,AACCGCGAGCTTATCG-1,AACCGCGAGGAATCGC-1,AACCGCGAGGCTCTTA-1,AACCGCGCATGCGCAC-1,AACCGCGGTCGGGTCT-1,AACCGCGGTGAGTATA-1,AACCGCGGTTGATTCG-1,AACGTTGCACGGTTTA-1,AACGTTGGTCCATGAT-1,AACTCAGAGTCGTTTG-1,AACTCAGCAATAAGCA-1,AACTCAGGTACACCGC-1,AACTCAGGTCTCATCC-1,AACTCAGGTTGTGGAG-1,AACTCAGTCAGTTCGA-1,AACTCAGTCGCTAGCG-1,AACTCCCAGAACAACT-1,AACTCTTAGAGACTTA-1,AACTCTTCACCGGAAA-1,AACTCTTGTACCAGTT-1,AACTTTCAGTTGAGTA-1,AACTTTCCAATAGCAA-1,AACTTTCTCGCGATCG-1,AAGACCTAGATATGGT-1,AAGACC

I am close but still far :(

dweemx commented 5 years ago

SCopeLoomR forces the users to have the same cells in the expression matrix (dgem) and in the embeddings. This why get this error.

You can solve this by making a subset of the expression matrix with only the cells present in the embeddings:

dgem.subset<-as.matrix(x=dgem)[, row.names(tsne)]
loom <- build_loom(file.name=file.name,
                      dgem=dgem.subset,
                      title="Fake expression dataset for examples",
                      genome="Mouse", # Just for user information, not used internally
                      default.embedding=tsne,
                      default.embedding.name=default.tne.name)
abhisheksinghnl commented 5 years ago

Thank you, I worked :)