Closed WinklerLaura closed 3 years ago
Hi @WinklerLaura,
In order to help you and point you to the right and best direction, may I ask why you need to generate a Loom ? Is it for pre-processing ? Or does a particular tool that you want to use need a loom file as input ? Is it SCope in particular ?
Regarding your other question, the example datasets, you mean the ones available on SCope ? In order to have them work with SCope, the datasets require to be somehow processed. We are developing single-cell pipelines that does this job and gennerate a SCope-ready loom file. See: https://github.com/vib-singlecell-nf/vsn-pipelines. We have also some documentation: https://vsn-pipelines.readthedocs.io/en/latest/?badge=latest
Hi @dweemx,
thank you for your fast reply! I am working with an Rscript, that was built to manipulate datasets, for example to do clustering. The input files have been datasets that were already loomfiles. Now I would like to use a dataset from the Allan Brain atlas. But the files are csv files and in order to use them in the Rscripts, I need to create a loomfile from the csv file. And in the Rscript the R package SCopeLoomR is used for further analysis.
If I understand you correctly, you mean I cannot convert a csv file into a loom file to use it with the package but have to preprocess the data in the csv file first?
No no, you don't need to pre-process it first since you already have some R scripts that require as input a loom
file.
I'll make a small snippet to convert a csv
to loom
file using SCopeLoomR
. I'll try to make this today
Thank you so much!! That would be amazing!
hi laura,
i can't recapitulate your SCopeLoomR problem, because you haven't given the exact code (a minimal example would have been useful). here is how i would create a simple loom file with an embedding and a bit of meta data:
library(SCopeLoomR)
counts <- read.table("raw_counts.tsv")
mdata <- read.table("meta_data.tsv")
tsne <- read.table("tsne.tsv")
build_loom( file.name = "test.loom",
dgem = counts,
title = "test example",
genome = "human",
default.embedding = tsne[colnames(counts),],
default.embedding.name = "tSNE"
)
## open the loom file to modify it
loom <- open_loom("test.loom", mode = "r+")
## now add meta data
add_col_attr(loom = loom, key = "percent.mito",
value = mdata[colnames(counts), "percent.mito"], as.metric = T)
add_col_attr(loom = loom, key = "Cell_Line",
value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)
## don't forget to close the loom file
close_loom(loom)
sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /vsc-hard-mounts/leuven-apps/skylake/2018a/software/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib/libopenblas_haswellp-r0.2.20.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SCopeLoomR_0.10.2
loaded via a namespace (and not attached):
[1] magrittr_1.5 usethis_1.6.1 devtools_2.3.0 bit_4.0.4
[5] pkgload_1.1.0 rjson_0.2.20 R6_2.5.0 rlang_0.4.8
[9] fansi_0.4.1 tools_4.0.0 pkgbuild_1.1.0 sessioninfo_1.1.1
[13] cli_2.1.0 hdf5r_1.3.3 withr_2.3.0 ellipsis_0.3.1
[17] remotes_2.2.0 bit64_4.0.5 assertthat_0.2.1 digest_0.6.27
[21] rprojroot_1.3-2 crayon_1.3.4 processx_3.4.4 callr_3.5.1
[25] fs_1.4.1 ps_1.4.0 testthat_3.0.0 curl_4.3
[29] memoise_1.1.0 glue_1.4.2 compiler_4.0.0 desc_1.2.0
[33] backports_1.2.0 prettyunits_1.1.1
if this code doesn't work for you, please post an example with (a subset of) your data.
as for you second question about the mouse brain data set, there is a possibility to explore it on this website: https://celltypes.brain-map.org/rnaseq/mouse_ctx-hip_10x and with the code above it should be straightforward to download the counts and meta data and the UMAP and make a loom file yourself.
Hi tropfenameimer,
I think now with this example I understand how to do it! thank you so much for your help!
Closing this issue since it seems to have been solved.
Hi,
I got a few issues that I was able to solve, but know I am stuck again, as it is giving me this error:
Error in get_dtype(x = dgem[1, 1]) : Unknown data type: factor
Calls: build_loom ... tryCatch -> tryCatchList -> tryCatchOne ->
Infos: R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale: [1] C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1
loaded via a namespace (and not attached): [1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2 [5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6 [9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2 [13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1 [17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1 [21] callr_3.3.2 fs_1.3.1 ps_1.3.0 curl_4.2 [25] testthat_2.2.1 memoise_1.1.0 glue_1.3.1 compiler_3.5.3 [29] desc_1.2.0 backports_1.1.5 prettyunits_1.0.2
code: sessionInfo()
counts <- read.table("..data/matrix_GABA.csv") mdata <- read.table("..data/metadata_GABA.csv") tsne <- read.table("..data/tsne_GABA.csv")
build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )
print('loomfile is successfully created')
loom <- open_loom("test.loom", mode = "r+")
add_col_attr(loom = loom, key = "percent.mito",
add_col_attr(loom = loom, key = "Cell_Line",
close_loom(loom)
I have tried to solve this for days now, but I didn't get any further, maybe you can tell me what I am doing wrong? Or where I can search for answers for solve an issue like this? Does the error mean that I have w wrong data type in the dataset? (factor?) Sorry for bothering you again, kind regards Laura
Hi @WinklerLaura, I think the error is due to this line:
counts <- read.table("..data/matrix_GABA.csv")
I'm don't know how your csv
file look like but I'd use the following command:
counts <- read.table("..data/matrix_GABA.csv", header=TRUE, row.names=1, quote='', stringsAsFactors=FALSE)
Also good to check afterwards if the data matrix was correctly read by inspecting counts
Thank you so much, I am trying it right now!
|======================================================================| 100%[1] "Adding column attributes..."
Error in FUN(left, right) : non-numeric argument to binary operator
Calls: build_loom ... tryCatch -> tryCatchList -> tryCatchOne ->
still getting an error =/
That's not very helpful as log I'll need a full example with matrix to reproduce your error. Could you provide that ?
oh ok, I understand.
These are my infos of the computer used and R:
R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale: [1] C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2
[5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6
[9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2
[13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1
[17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1
[21] callr_3.3.2 fs_1.3.1 ps_1.3.0 curl_4.2
[25] testthat_2.2.1 memoise_1.1.0 glue_1.3.1 compiler_3.5.3
[29] desc_1.2.0 backports_1.1.5 prettyunits_1.0.2
[1] "Adding global attributes..."
[1] "Adding matrix..."
|======================================================================| 100%[1] "Adding column attribut es..."
Error in FUN(left, right) : non-numeric argument to binary operator
Calls: build_loom ... tryCatch -> tryCatchList -> tryCatchOne ->
The matrix I used was the one from the Allan Brain Atlas: https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x there I used these 3 files:
Afterwards, I followed the instructions from tropfenameimer using this Rscript:
library(devtools) # To install packages as SCopeLoomR devtools::install_github("aertslab/SCopeLoomR") library(SCopeLoomR) # to convert the csv file into loom file
sessionInfo()
counts <- read.table("../data/matrix_GABA.csv", header=TRUE, row.names=1, qu$ mdata <- read.table("../data/metadata_GABA.csv", header=TRUE, row.names=1, q$ tsne <- read.table("../data/tsne_GABA.csv", header=TRUE, row.names=1, quote=$
build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )
print('loomfile is successfully created')
loom <- open_loom("test.loom", mode = "r+")
loom <- open_loom("test.loom", mode = "r+")
add_col_attr(loom = loom, key = "percent.mito",
add_col_attr(loom = loom, key = "Cell_Line",
close_loom(loom)
Thank you so much for helping me!
hi laura, since you didn't send a reproducible example, i made one with data from the allen brain map (not the large mouse data set, but a smaller one). an issue with the allen data might be that the expression matrix is not in gene x cell format. but i haven't verified this for the data set you want to process.
can you please try to reproduce the example below and in case there are issues post your complete code with error messages here?
## download data from allen brain map (you can do this manually)
wget https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/matrix.csv
wget https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/metadata.csv
wget https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/tsne.csv
## in R: create loom file:
library(SCopeLoomR)
library(data.table) #quicker data import with fread() than with read.csv()
## import counts matrix -> transpose to gene x cell format
counts <- fread("matrix.csv", sep = ",")
cellnames <- counts[, sample_name]
counts <- t(counts[,2:ncol(counts)])
colnames(counts) <- cellnames
counts[1:5,1:3]
# F2S4_160113_027_A01 F2S4_160113_027_B01 F2S4_160113_027_C01
#3.8-1.2 0 0 0
#3.8-1.3 0 0 0
#3.8-1.4 0 0 0
#3.8-1.5 0 0 0
#5-HT3C2 0 0 0
mdata <- read.csv("metadata.csv", row.names = 1)
mdata[1:5,1:5]
# exp_component_name specimen_type cluster_color
#F2S4_160113_027_A01 LS-15005h_S01_E1-50 nucleus
#F2S4_160113_027_B01 LS-15005h_S02_E1-50 nucleus #E170FE
#F2S4_160113_027_C01 LS-15005h_S03_E1-50 nucleus #8E5864
#F2S4_160113_027_D01 LS-15005h_S04_E1-50 nucleus #8B5862
#F2S4_160113_027_E01 LS-15005h_S05_E1-50 nucleus #CF6EC9
# cluster_order cluster_label
#F2S4_160113_027_A01 NA
#F2S4_160113_027_B01 32 Inh L2-5 VIP TOX2
#F2S4_160113_027_C01 2 Inh L1 LAMP5 GGT8P
#F2S4_160113_027_D01 1 Inh L1 LAMP5 NDNF
#F2S4_160113_027_E01 34 Inh L1-3 VIP ZNF322P1
tsne <- read.csv("tsne.csv", row.names = 1)
head(tsne)
# tsne_1 tsne_2
#F2S4_160113_027_B01 32.15014 4.800442
#F2S4_160113_027_C01 25.06135 -27.205498
#F2S4_160113_027_D01 25.36504 -25.813437
#F2S4_160113_027_E01 27.07534 13.713614
#F2S4_160113_027_F01 27.66046 2.411001
#F2S4_160113_027_G01 -8.20726 -33.097583
## the tsne data don't contain all cells that are present in counts & metadata
## -> remove missing cells from counts data
keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)]
counts <- subset(counts, , keepcells)
build_loom( file.name = "test.loom",
dgem = counts,
title = "test example",
genome = "human",
default.embedding = tsne[colnames(counts),],
default.embedding.name = "tSNE"
)
#Warning message:
#In if (class(x = dgem) == "dgTMatrix") { :
# the condition has length > 1 and only the first element will be used
#-> you can ignore this warning
loom <- open_loom("test.loom", mode = "r+")
add_col_attr(loom = loom, key = "cluster_label",
value = mdata[colnames(counts), "cluster_label"], as.annotation = T)
close_loom(loom)
sessionInfo()
#R version 4.0.0 (2020-04-24)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)
#
#Matrix products: default
#BLAS/LAPACK: /vsc-hard-mounts/leuven-apps/skylake/2018a/software/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib/libopenblas_haswellp-r0.2.20.so
#
#locale:
#[1] C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#other attached packages:
#[1] data.table_1.13.2 SCopeLoomR_0.10.2
#
#loaded via a namespace (and not attached):
#[1] bit_4.0.4 compiler_4.0.0 rjson_0.2.20 R6_2.5.0 tools_4.0.0
#[6] hdf5r_1.3.3 bit64_4.0.5
Hi, I have tried the code but get another error. I would like to show a reproducible sample, how can I do that with a file of that size? Would it help to show it to you through the head() function -> for example the first two rows? Or to show you the script I have used to filter the file?
library(devtools) # To install packages as SCopeLoomR devtools::install_github("aertslab/SCopeLoomR") install.packages("data.table") library(SCopeLoomR) # to convert the csv file into loom file library(data.table) # quicker data import with fread() than with read.csv
counts <- fread("../data/matrix_GABA.csv", sep = ",") cellnames <- counts[, sample_name] counts <- t(counts[,2:ncol(counts)]) colnames(counts) <- cellnames
counts[1:5,1:3]
mdata <- read.csv("../data/metadata_GABA.csv", row.names=1) mdata[1:5,1:5]
tsne <- read.csv("../data/tsne_GABA.csv", row.names=1) head(tsne)
keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)] counts <- subset(counts, , keepcells) build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )
loom <- open_loom("test.loom", mode = "r+")
add_col_attr(loom = loom, key = "percent.mito",
add_col_attr(loom = loom, key = "Cell_Line",
close_loom(loom)
Errormessage:
[.data.table
(counts, , sample_name) :
j (the 2nd argument inside [...]) is a single symbol but column name 'sample_name' is not found. Perhaps you intended DT[, ..sample_name]. This difference to data.frame is deliberate and explained in FAQ 1.1.
Calls: [ -> [.data.table
Execution haltedas you said you need to see the whole error message:
installing source package 'SCopeLoomR' ... R data * moving datasets to lazyload DB Warning: namespace 'SummarizedExperiment' is not available and has been replaced by .GlobalEnv when processing object 'sce' Warning: namespace 'SummarizedExperiment' is not available and has been replaced by .GlobalEnv when processing object 'sce' inst byte-compile and prepare package for lazy loading * help installing help indices building package indices installing vignettes ** testing if installed package can be loaded
downloaded 5.0 MB
installing source package 'data.table' ...
package 'data.table' successfully unpacked and MD5 sums checked
gcc -std=gnu99 4.9.2
zlib 1.2.8 is available ok
R CMD SHLIB supports OpenMP without any extra hint
libs
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c assign.c -o assign.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c between.c -o between.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c bmerge.c -o bmerge.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c chmatch.c -o chmatch.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c cj.c -o cj.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c coalesce.c -o coalesce.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c dogroups.c -o dogroups.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fastmean.c -o fastmean.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fcast.c -o fcast.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fifelse.c -o fifelse.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fmelt.c -o fmelt.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c forder.c -o forder.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c frank.c -o frank.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fread.c -o fread.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c freadR.c -o freadR.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c froll.c -o froll.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c frollR.c -o frollR.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c frolladaptive.c -o frolladaptive.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fsort.c -o fsort.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fwrite.c -o fwrite.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c fwriteR.c -o fwriteR.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c gsumm.c -o gsumm.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c ijoin.c -o ijoin.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c init.c -o init.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c inrange.c -o inrange.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c nafill.c -o nafill.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c nqrecreateindices.c -o nqrecreateindices.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c openmp-utils.c -o openmp-utils.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c quickselect.c -o quickselect.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c rbindlist.c -o rbindlist.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c reorder.c -o reorder.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c shift.c -o shift.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c snprintf.c -o snprintf.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c subset.c -o subset.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c transpose.c -o transpose.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c types.c -o types.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c uniqlist.c -o uniqlist.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c utils.c -o utils.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c vecseq.c -o vecseq.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG -fopenmp -fpic -g -O2 -fst ack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c wrappers.c -o wrappers.o
gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-z,relro -o data.table.so assign.o b etween.o bmerge.o chmatch.o cj.o coalesce.o dogroups.o fastmean.o fcast.o fifels e.o fmelt.o forder.o frank.o fread.o freadR.o froll.o frollR.o frolladaptive.o f sort.o fwrite.o fwriteR.o gsumm.o ijoin.o init.o inrange.o nafill.o nqrecreatein dices.o openmp-utils.o quickselect.o rbindlist.o reorder.o shift.o snprintf.o su bset.o transpose.o types.o uniqlist.o utils.o vecseq.o wrappers.o -fopenmp -lz - L/usr/lib/R/lib -lR
PKG_CFLAGS = -fopenmp
PKG_LIBS = -fopenmp -lz
if [ "data.table.so" != "datatable.so" ]; then mv data.table.so datatable.so; fi
if [ "" != "Windows_NT" ] && [ uname -s
= 'Darwin' ]; then install_name_tool - id datatable.so datatable.so; fi
installing to /usr/local/lib/R/site-library/data.table/libs
R
inst
byte-compile and prepare package for lazy loading
help
* installing help indices
building package indices
installing vignettes
testing if installed package can be loaded
DONE (data.table)
The downloaded source packages are in '/tmp/RtmpsdzVWN/downloaded_packages'
Attaching package: 'SCopeLoomR'
The following object is masked from 'package:base':
flush
Error in [.data.table
(counts, , sample_name) :
j (the 2nd argument inside [...]) is a single symbol but column name 'sample_n ame' is not found. Perhaps you intended DT[, ..sample_name]. This difference to data.frame is deliberate and explained in FAQ 1.1.
Calls: [ -> [.data.table
Execution halted
Please let me know, if I can give you more info about the dataset I used or if you need any other information! Thank you so much! Laura
hi laura,
the matrix.csv i downloaded contains a column named "sample_name". you are trying to run my code on another data set, and i guess your data frame "matrix_GABA.csv" does not contain such a column. hence the error message. can you please post an excerpt of all data frames you are using (e.g. counts[1:5,1:3])?
katina
Hi Katina,
I think I know now what I did wrong. These are the used datasets: [1] "matrix:" V1 V2 V3 1: AAAGTAGGTGAGGCTA-L8TX_180221_01_F09 9 0 2: ACCTTTATCGACAGCC-L8TX_180221_01_F09 13 0 3: ACCTTTATCTTCTGGC-L8TX_180221_01_F09 8 1 4: ACGAGCCAGATATGGT-L8TX_180221_01_F09 3 1 5: ACGAGCCAGCTGAACG-L8TX_180221_01_F09 8 0 [1] "metadata:" CCATGTCAGCGCTTAT.2L8TX_180221_01_C11 X10x TGGTTCCTCACGACTA-L8TX_180221_01_E11 TGGTTCCTCACGACTA-4L8TX_180221_01_E11 10x CTACGTCCAACTGGCC-L8TX_181011_01_B03 CTACGTCCAACTGGCC-6L8TX_181011_01_B03 10x CTAGAGTCACGCGAAA-L8TX_171026_01_B05 CTAGAGTCACGCGAAA-6L8TX_171026_01_B05 10x ATAGACCTCCCAAGAT-L8TX_180221_01_D09 ATAGACCTCCCAAGAT-4L8TX_180221_01_D09 10x GTTCGGGCAATCGAAA-L8TX_181206_01_B12 GTTCGGGCAATCGAAA-3L8TX_181206_01_B12 10x X.C60C0F X2 X2_Meis2 TGGTTCCTCACGACTA-L8TX_180221_01_E11 #8C5962 7 7_Lamp5 CTACGTCCAACTGGCC-L8TX_181011_01_B03 #8C5962 7 7_Lamp5 CTAGAGTCACGCGAAA-L8TX_171026_01_B05 #8C5962 7 7_Lamp5 ATAGACCTCCCAAGAT-L8TX_180221_01_D09 #8C5962 7 7_Lamp5 GTTCGGGCAATCGAAA-L8TX_181206_01_B12 #8C5962 7 7_Lamp5 [1] "tsne:" X5.02848805333333 X19.2537504.56 X2 TGGTTCCTCACGACTA-L8TX_180221_01_E11 -9.515233 4.96538642\t472 2 CTACGTCCAACTGGCC-L8TX_181011_01_B03 -10.224533 5.32504034666667\t473 2 CTAGAGTCACGCGAAA-L8TX_171026_01_B05 -10.942459 6.26142136666667\t474 2 ATAGACCTCCCAAGAT-L8TX_180221_01_D09 -8.635556 5.56036584\t475 2 GTTCGGGCAATCGAAA-L8TX_181206_01_B12 -9.835173 6.1383559\t476 2 GACGTGCCACACCGAC-L8TX_171026_01_G04 -9.510277 7.67432181333333\t477 2 R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0
I filtered all three files for "GABAergic" in a specific cell (I think it was class_label of the metadata). But I've now realized, that of course it would filter out the first row (with all the headers like "sample_name") as well. So, if I correct the filtering (keep the headers), then it should work with the code from you (above) I guess. Sorry for the stupid mistake, I am really new to all this. Thank you so much for your help!! I will correct it immediately and tell you if it works then!
Hi, so I tried different things, as I still got some errors. Now, that I can be 100% sure, that it is not due to my preprocessing, I took all three files directly from the Allen Brain matrix and ran the script above, but it tells me " unexpected end of input"
R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale: [1] C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.13.4 SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1
loaded via a namespace (and not attached): [1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2 [5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6 [9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2 [13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1 [17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1 [21] callr_3.3.2 fs_1.3.1 ps_1.3.0 curl_4.2 [25] testthat_2.2.1 memoise_1.1.0 glue_1.3.1 compiler_3.5.3 [29] desc_1.2.0 backports_1.1.5 prettyunits_1.0.2 AAACCTGAGAAACGCC-L8TX_180221_01_F09 AAACCTGAGAAGGTTT-L8TX_180221_01_F09 Xkr4 8 13 Gm1992 0 2 Gm37381 0 0 Rp1 0 0 Sox17 0 0 AAACCTGAGAGTAATC-L8TX_180221_01_F09 Xkr4 8 Gm1992 0 Gm37381 0 Rp1 0 Sox17 0 exp_component_name ACTGAGTAGGTGACCA-L8TX_180221_01_C11 ACTGAGTAGGTGACCA-2L8TX_180221_01_C11 TCAACGACAGACAGGT-L8TX_180221_01_B11 TCAACGACAGACAGGT-1L8TX_180221_01_B11 CTCGTACTCACGACTA-L8TX_180221_01_C11 CTCGTACTCACGACTA-2L8TX_180221_01_C11 CACCAGGCAGGTCTCG-L8TX_180221_01_C11 CACCAGGCAGGTCTCG-2L8TX_180221_01_C11 GACTGCGCAAAGGTGC-L8TX_180221_01_C11 GACTGCGCAAAGGTGC-2L8TX_180221_01_C11 platform_label cluster_color cluster_order ACTGAGTAGGTGACCA-L8TX_180221_01_C11 10x #5E5765 355 TCAACGACAGACAGGT-L8TX_180221_01_B11 10x #5E5765 355 CTCGTACTCACGACTA-L8TX_180221_01_C11 10x #5E5765 355 CACCAGGCAGGTCTCG-L8TX_180221_01_C11 10x #5E5765 355 GACTGCGCAAAGGTGC-L8TX_180221_01_C11 10x #5E5765 355 cluster_label ACTGAGTAGGTGACCA-L8TX_180221_01_C11 355_V3d TCAACGACAGACAGGT-L8TX_180221_01_B11 355_V3d CTCGTACTCACGACTA-L8TX_180221_01_C11 355_V3d CACCAGGCAGGTCTCG-L8TX_180221_01_C11 355_V3d GACTGCGCAAAGGTGC-L8TX_180221_01_C11 355_V3d tsne_1 tsne_2 ACTGAGTAGGTGACCA-L8TX_180221_01_C11 5.027914 19.24911 TCAACGACAGACAGGT-L8TX_180221_01_B11 4.998868 19.24150 CTCGTACTCACGACTA-L8TX_180221_01_C11 5.027664 19.25188 CACCAGGCAGGTCTCG-L8TX_180221_01_C11 4.984898 19.24268 GACTGCGCAAAGGTGC-L8TX_180221_01_C11 5.024889 19.25244 CCATGTCAGCTGCCCA-L8TX_180221_01_C11 4.990352 19.24156 [1] "Adding global attributes..." [1] "Adding matrix..." |======================================================================| 100%[1] "Adding column attributes..." [1] "Adding default metrics nUMI..." [1] "Adding default metrics nGene..." [1] "Adding default embedding..." [1] "Adding row attributes..." [1] "Adding columns graphs..." [1] "Adding row graphs..." [1] "Adding layers..." Error: unexpected end of input Execution halted
but that is really weird, because you used the same matrix.csv file, right?
and I used this code:
library(devtools) # To install packages as SCopeLoomR devtools::install_github("aertslab/SCopeLoomR") install.packages("data.table") library(SCopeLoomR) # to convert the csv file into loom file library(data.table) # quicker data import with fread() than with read.csv
sessionInfo()
counts <- fread("../data/matrix.csv", sep = ",") cellnames <- counts[, sample_name] counts <- t(counts[,2:ncol(counts)]) colnames(counts) <- cellnames
counts[1:5,1:3]
mdata <- read.csv("../data/metadata.csv", row.names=1) mdata[1:5,1:5]
tsne <- read.csv("../data/tsne_million.csv", row.names=1) head(tsne)
keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)] counts <- subset(counts, , keepcells)
build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )
loom <- open_loom("million_GABA_SCopeLoomR.loom", mode = "r+")
add_col_attr(loom = loom, key = "percent.mito",
add_col_attr(loom = loom, key = "Cell_Line",
close_loom(loom)
Does it have to do with the tSNE still, as it does not comprise all the counts? Is it possible to build the loom file with matrix and metadata without tSNE? (I don't necessarily need it). Then maybe it works? I will try that. Best, Laura
hi laura,
i didn't test it on the mouse data you want to use, because they are so large. instead, i downloaded a smaller human data set, hoping that the allen data are all similarly formatted: https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/matrix.csv https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/metadata.csv https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/tsne.csv
so in the human data set, there was this tsne issue, but for the mouse data i don't know, of course. have you tried to run the script on the human data and does that work?
it is possible to build a loom file without tsne. but then you won't see anything in scope, which is not what you want, i guess. (by the way, you don't need to reinstall the packages every time; once you have installed them, you can just load them with library())
i will download the mouse data, when i have time.
katina
hi katina, no, I didn't try that, but I will do that now! (True, thank you for the hint!)
thank you so much!
Hi, I downloaded the 3 human datafiles and ran the code with these files, but I got the same error as with the mouse dataset: R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale: [1] C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.13.4 SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1
code:
library(devtools) # To install packages as SCopeLoomR
library(SCopeLoomR) # to convert the csv file into loom file library(data.table) # quicker data import with fread() than with read.csv
sessionInfo()
counts <- fread("../data/matrix_human.csv", sep = ",") cellnames <- counts[, sample_name] counts <- t(counts[,2:ncol(counts)]) colnames(counts) <- cellnames
counts[1:5,1:3]
mdata <- read.csv("../data/metadata_human.csv", row.names=1) mdata[1:5,1:5]
tsne <- read.csv("../data/tsne_human.csv", row.names=1) head(tsne)
keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)] counts <- subset(counts, , keepcells)
build_loom( file.name = "million_human_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )
loom <- open_loom("million_human_SCopeLoomR.loom", mode = "r+")
add_col_attr(loom = loom, key = "percent.mito",
add_col_attr(loom = loom, key = "Cell_Line",
if we are running the same code, with the same files and it is not working in my case, but it is working your case, what could cause this issue? Is it possible that it has something to do with the version of R? as I am using R 3.5.3? ( I am running it in a docker container, in case that is important)
It had nothing to do with the tSNE file. I tried it without getting the same error (and a warning)
[1] "Adding global attributes..." [1] "Adding matrix..." |======================================================================| 100%[1] "Adding column attributes..." [1] "Adding default metrics nUMI..." [1] "Adding default metrics nGene..." [1] "Adding row attributes..." [1] "Adding columns graphs..." [1] "Adding row graphs..." [1] "Adding layers..." Warning message: In doTryCatch(return(expr), name, parentenv, handler) : No default embedding set for the loom. You'll not be able to visualize it in SCope. Error: unexpected end of input Execution halted
ok, i have made a loom file with the GABAergic cells only: http://scope.aertslab.org/#/22b5f09c-ab5d-4395-a171-c6a3fec4daee/ (go to: user uploaded -> uncategorized) when you type 'label' in the gene box, you can select annotations to colour the plot. please download the loom file (small disk symbol next loom file), the session will expire in a few days.
the full data set was too big for my computer. i'd have to try on a different machine. maybe that's why it doesn't work for you.
Thank you so so much!!! That is actually the reason why I am working on a big server, but maybe it is still getting too big when converting into a loom file. Thank you so much for your help!!
Hi,
I am quite new in this field and haven't use loomfiles before, but now I need to turn a data set (.csv file) into a loomfile. First I used LoomR in R using this code:
matrix <- as.matrix(read.csv("../data/Example_expression_dataset.csv", header=T)) print("finished reading, starting to create the loom file") Example_expression_dataset_loom <- loomR::create(filename="Example_expression_dataset.loom", data = matrix)
I then got an error: Error in -> get_cell_ids -> [[ -> [[.H5Group
[[.H5Group
(ra, CA_CELLID) : An object with name CellID does not exist in this group Calls: start ... readFile ->This led me to the idea that I might need to create the loomfile following the SCopeLoomR tutorial (https://github.com/aertslab/SCopeLoomR/blob/master/vignettes/SCopeLoomR_tutorial.Rmd),
load data
create loom file
that I can use it with ScopeLoomR (rather than LoomR). I tried that, but got this error then:
Error in counts(sce) : could not find function "counts" Execution halted
I couldn't find any info about it online, but I am quite new in this area, so maybe there is a quick fix that I am not able to see.
I am happy for any hint!
And another question, as I saw that you have example dataset that are already available to use on the page of the tutorial, may I ask you if it would be possible to create a loom file of this mouse brain dataset (https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x) that me and probably a lot of other scientists are able to use it with SCopeLoomR? That would really help me, as I am struggling to convert this .csv file into a loomfile, that I can work with.
Kind regards, Laura