aertslab / SCopeLoomR

R package (compatible with SCope) to create generic .loom files and extend them with other data e.g.: SCENIC regulons, Seurat clusters and markers, ...
MIT License
40 stars 15 forks source link

problems converting a .csv file into loomfile, which can be manipulated with SCopeLoomR #30

Closed WinklerLaura closed 3 years ago

WinklerLaura commented 3 years ago

Hi,

I am quite new in this field and haven't use loomfiles before, but now I need to turn a data set (.csv file) into a loomfile. First I used LoomR in R using this code:

matrix <- as.matrix(read.csv("../data/Example_expression_dataset.csv", header=T)) print("finished reading, starting to create the loom file") Example_expression_dataset_loom <- loomR::create(filename="Example_expression_dataset.loom", data = matrix)

I then got an error: Error in [[.H5Group(ra, CA_CELLID) : An object with name CellID does not exist in this group Calls: start ... readFile -> -> get_cell_ids -> [[ -> [[.H5Group

This led me to the idea that I might need to create the loomfile following the SCopeLoomR tutorial (https://github.com/aertslab/SCopeLoomR/blob/master/vignettes/SCopeLoomR_tutorial.Rmd),

load data

create loom file

that I can use it with ScopeLoomR (rather than LoomR). I tried that, but got this error then:

Error in counts(sce) : could not find function "counts" Execution halted

I couldn't find any info about it online, but I am quite new in this area, so maybe there is a quick fix that I am not able to see.

I am happy for any hint!

And another question, as I saw that you have example dataset that are already available to use on the page of the tutorial, may I ask you if it would be possible to create a loom file of this mouse brain dataset (https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x) that me and probably a lot of other scientists are able to use it with SCopeLoomR? That would really help me, as I am struggling to convert this .csv file into a loomfile, that I can work with.

Kind regards, Laura

dweemx commented 3 years ago

Hi @WinklerLaura,

In order to help you and point you to the right and best direction, may I ask why you need to generate a Loom ? Is it for pre-processing ? Or does a particular tool that you want to use need a loom file as input ? Is it SCope in particular ?

Regarding your other question, the example datasets, you mean the ones available on SCope ? In order to have them work with SCope, the datasets require to be somehow processed. We are developing single-cell pipelines that does this job and gennerate a SCope-ready loom file. See: https://github.com/vib-singlecell-nf/vsn-pipelines. We have also some documentation: https://vsn-pipelines.readthedocs.io/en/latest/?badge=latest

WinklerLaura commented 3 years ago

Hi @dweemx,

thank you for your fast reply! I am working with an Rscript, that was built to manipulate datasets, for example to do clustering. The input files have been datasets that were already loomfiles. Now I would like to use a dataset from the Allan Brain atlas. But the files are csv files and in order to use them in the Rscripts, I need to create a loomfile from the csv file. And in the Rscript the R package SCopeLoomR is used for further analysis.

If I understand you correctly, you mean I cannot convert a csv file into a loom file to use it with the package but have to preprocess the data in the csv file first?

dweemx commented 3 years ago

No no, you don't need to pre-process it first since you already have some R scripts that require as input a loom file. I'll make a small snippet to convert a csv to loom file using SCopeLoomR. I'll try to make this today

WinklerLaura commented 3 years ago

Thank you so much!! That would be amazing!

tropfenameimer commented 3 years ago

hi laura,

i can't recapitulate your SCopeLoomR problem, because you haven't given the exact code (a minimal example would have been useful). here is how i would create a simple loom file with an embedding and a bit of meta data:

library(SCopeLoomR)
counts <- read.table("raw_counts.tsv")
mdata <- read.table("meta_data.tsv")
tsne <- read.table("tsne.tsv")

build_loom( file.name = "test.loom",
        dgem = counts,
        title = "test example",
        genome = "human",
        default.embedding = tsne[colnames(counts),],
        default.embedding.name = "tSNE"
)

## open the loom file to modify it
loom <- open_loom("test.loom", mode = "r+")

## now add meta data
add_col_attr(loom = loom, key = "percent.mito",
        value = mdata[colnames(counts), "percent.mito"], as.metric = T)
add_col_attr(loom = loom, key = "Cell_Line",
        value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)

## don't forget to close the loom file
close_loom(loom)

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /vsc-hard-mounts/leuven-apps/skylake/2018a/software/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib/libopenblas_haswellp-r0.2.20.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] SCopeLoomR_0.10.2
loaded via a namespace (and not attached):
 [1] magrittr_1.5      usethis_1.6.1     devtools_2.3.0    bit_4.0.4
 [5] pkgload_1.1.0     rjson_0.2.20      R6_2.5.0          rlang_0.4.8
 [9] fansi_0.4.1       tools_4.0.0       pkgbuild_1.1.0    sessioninfo_1.1.1
[13] cli_2.1.0         hdf5r_1.3.3       withr_2.3.0       ellipsis_0.3.1
[17] remotes_2.2.0     bit64_4.0.5       assertthat_0.2.1  digest_0.6.27
[21] rprojroot_1.3-2   crayon_1.3.4      processx_3.4.4    callr_3.5.1
[25] fs_1.4.1          ps_1.4.0          testthat_3.0.0    curl_4.3
[29] memoise_1.1.0     glue_1.4.2        compiler_4.0.0    desc_1.2.0
[33] backports_1.2.0   prettyunits_1.1.1

if this code doesn't work for you, please post an example with (a subset of) your data.

as for you second question about the mouse brain data set, there is a possibility to explore it on this website: https://celltypes.brain-map.org/rnaseq/mouse_ctx-hip_10x and with the code above it should be straightforward to download the counts and meta data and the UMAP and make a loom file yourself.

WinklerLaura commented 3 years ago

Hi tropfenameimer,

I think now with this example I understand how to do it! thank you so much for your help!

dweemx commented 3 years ago

Closing this issue since it seems to have been solved.

WinklerLaura commented 3 years ago

Hi, I got a few issues that I was able to solve, but know I am stuck again, as it is giving me this error: Error in get_dtype(x = dgem[1, 1]) : Unknown data type: factor Calls: build_loom ... tryCatch -> tryCatchList -> tryCatchOne -> Execution halted

Infos: R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2 [5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6 [9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2 [13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1 [17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1 [21] callr_3.3.2 fs_1.3.1 ps_1.3.0 curl_4.2 [25] testthat_2.2.1 memoise_1.1.0 glue_1.3.1 compiler_3.5.3 [29] desc_1.2.0 backports_1.1.5 prettyunits_1.0.2

code: sessionInfo()

counts <- read.table("..data/matrix_GABA.csv") mdata <- read.table("..data/metadata_GABA.csv") tsne <- read.table("..data/tsne_GABA.csv")

build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )

print('loomfile is successfully created')

open the loom file to modify it

loom <- open_loom("test.loom", mode = "r+")

now add meta data

add_col_attr(loom = loom, key = "percent.mito",

value = mdata[colnames(counts), "percent.mito"], as.metric = T)

add_col_attr(loom = loom, key = "Cell_Line",

value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)

don't forget to close the loom file

close_loom(loom)

I have tried to solve this for days now, but I didn't get any further, maybe you can tell me what I am doing wrong? Or where I can search for answers for solve an issue like this? Does the error mean that I have w wrong data type in the dataset? (factor?) Sorry for bothering you again, kind regards Laura

dweemx commented 3 years ago

Hi @WinklerLaura, I think the error is due to this line:

counts <- read.table("..data/matrix_GABA.csv")

I'm don't know how your csv file look like but I'd use the following command:

counts <- read.table("..data/matrix_GABA.csv", header=TRUE, row.names=1, quote='', stringsAsFactors=FALSE)

Also good to check afterwards if the data matrix was correctly read by inspecting counts

WinklerLaura commented 3 years ago

Thank you so much, I am trying it right now!

WinklerLaura commented 3 years ago

|======================================================================| 100%[1] "Adding column attributes..." Error in FUN(left, right) : non-numeric argument to binary operator Calls: build_loom ... tryCatch -> tryCatchList -> tryCatchOne -> Execution halted

still getting an error =/

dweemx commented 3 years ago

That's not very helpful as log I'll need a full example with matrix to reproduce your error. Could you provide that ?

WinklerLaura commented 3 years ago

oh ok, I understand.

These are my infos of the computer used and R:

R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2 [5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6 [9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2 [13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1 [17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1 [21] callr_3.3.2 fs_1.3.1 ps_1.3.0 curl_4.2 [25] testthat_2.2.1 memoise_1.1.0 glue_1.3.1 compiler_3.5.3 [29] desc_1.2.0 backports_1.1.5 prettyunits_1.0.2 [1] "Adding global attributes..." [1] "Adding matrix..." |======================================================================| 100%[1] "Adding column attribut es..." Error in FUN(left, right) : non-numeric argument to binary operator Calls: build_loom ... tryCatch -> tryCatchList -> tryCatchOne -> Execution halted

The matrix I used was the one from the Allan Brain Atlas: https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x there I used these 3 files:

Afterwards, I followed the instructions from tropfenameimer using this Rscript:

Load libraries

library(devtools) # To install packages as SCopeLoomR devtools::install_github("aertslab/SCopeLoomR") library(SCopeLoomR) # to convert the csv file into loom file

sessionInfo()

counts <- read.table("../data/matrix_GABA.csv", header=TRUE, row.names=1, qu$ mdata <- read.table("../data/metadata_GABA.csv", header=TRUE, row.names=1, q$ tsne <- read.table("../data/tsne_GABA.csv", header=TRUE, row.names=1, quote=$

build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )

print('loomfile is successfully created')

open the loom file to modify it

loom <- open_loom("test.loom", mode = "r+")

open the loom file to modify it

loom <- open_loom("test.loom", mode = "r+")

now add meta data

add_col_attr(loom = loom, key = "percent.mito",

value = mdata[colnames(counts), "percent.mito"], as.metric = T)

add_col_attr(loom = loom, key = "Cell_Line",

value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)

don't forget to close the loom file

close_loom(loom)

Thank you so much for helping me!

tropfenameimer commented 3 years ago

hi laura, since you didn't send a reproducible example, i made one with data from the allen brain map (not the large mouse data set, but a smaller one). an issue with the allen data might be that the expression matrix is not in gene x cell format. but i haven't verified this for the data set you want to process.

can you please try to reproduce the example below and in case there are issues post your complete code with error messages here?

## download data from allen brain map (you can do this manually)
wget https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/matrix.csv 
wget https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/metadata.csv 
wget https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/tsne.csv

## in R: create loom file:
library(SCopeLoomR)
library(data.table)     #quicker data import with fread() than with read.csv()

## import counts matrix -> transpose to gene x cell format
counts <- fread("matrix.csv", sep = ",")
cellnames <- counts[, sample_name] 
counts <- t(counts[,2:ncol(counts)])
colnames(counts) <- cellnames

counts[1:5,1:3]
#        F2S4_160113_027_A01 F2S4_160113_027_B01 F2S4_160113_027_C01
#3.8-1.2                   0                   0                   0
#3.8-1.3                   0                   0                   0
#3.8-1.4                   0                   0                   0
#3.8-1.5                   0                   0                   0
#5-HT3C2                   0                   0                   0

mdata <- read.csv("metadata.csv", row.names = 1)
mdata[1:5,1:5]
#                     exp_component_name specimen_type cluster_color
#F2S4_160113_027_A01 LS-15005h_S01_E1-50       nucleus
#F2S4_160113_027_B01 LS-15005h_S02_E1-50       nucleus       #E170FE
#F2S4_160113_027_C01 LS-15005h_S03_E1-50       nucleus       #8E5864
#F2S4_160113_027_D01 LS-15005h_S04_E1-50       nucleus       #8B5862
#F2S4_160113_027_E01 LS-15005h_S05_E1-50       nucleus       #CF6EC9
#                    cluster_order         cluster_label
#F2S4_160113_027_A01            NA                  
#F2S4_160113_027_B01            32     Inh L2-5 VIP TOX2
#F2S4_160113_027_C01             2    Inh L1 LAMP5 GGT8P
#F2S4_160113_027_D01             1     Inh L1 LAMP5 NDNF
#F2S4_160113_027_E01            34 Inh L1-3 VIP ZNF322P1

tsne <- read.csv("tsne.csv", row.names = 1)
head(tsne)
#                      tsne_1     tsne_2
#F2S4_160113_027_B01 32.15014   4.800442
#F2S4_160113_027_C01 25.06135 -27.205498
#F2S4_160113_027_D01 25.36504 -25.813437
#F2S4_160113_027_E01 27.07534  13.713614
#F2S4_160113_027_F01 27.66046   2.411001
#F2S4_160113_027_G01 -8.20726 -33.097583

## the tsne data don't contain all cells that are present in counts & metadata
## -> remove missing cells from counts data
keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)]
counts <- subset(counts, , keepcells)

build_loom( file.name = "test.loom",
        dgem = counts,
        title = "test example",
        genome = "human",
        default.embedding = tsne[colnames(counts),],
        default.embedding.name = "tSNE"
)

#Warning message:
#In if (class(x = dgem) == "dgTMatrix") { :
#  the condition has length > 1 and only the first element will be used
#-> you can ignore this warning

loom <- open_loom("test.loom", mode = "r+")
add_col_attr(loom = loom, key = "cluster_label",
        value = mdata[colnames(counts), "cluster_label"], as.annotation = T)
close_loom(loom)

sessionInfo()
#R version 4.0.0 (2020-04-24)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: CentOS Linux 7 (Core)
#
#Matrix products: default
#BLAS/LAPACK: /vsc-hard-mounts/leuven-apps/skylake/2018a/software/OpenBLAS/0.2.20-GCC-6.4.0-2.28/lib/libopenblas_haswellp-r0.2.20.so
#
#locale:
#[1] C
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base     
#
#other attached packages:
#[1] data.table_1.13.2 SCopeLoomR_0.10.2
#
#loaded via a namespace (and not attached):
#[1] bit_4.0.4      compiler_4.0.0 rjson_0.2.20   R6_2.5.0       tools_4.0.0   
#[6] hdf5r_1.3.3    bit64_4.0.5  
WinklerLaura commented 3 years ago

Hi, I have tried the code but get another error. I would like to show a reproducible sample, how can I do that with a file of that size? Would it help to show it to you through the head() function -> for example the first two rows? Or to show you the script I have used to filter the file?

Load libraries

library(devtools) # To install packages as SCopeLoomR devtools::install_github("aertslab/SCopeLoomR") install.packages("data.table") library(SCopeLoomR) # to convert the csv file into loom file library(data.table) # quicker data import with fread() than with read.csv

import counts matrix -> transpose to gene x cell fomrat

counts <- fread("../data/matrix_GABA.csv", sep = ",") cellnames <- counts[, sample_name] counts <- t(counts[,2:ncol(counts)]) colnames(counts) <- cellnames

counts[1:5,1:3]

mdata <- read.csv("../data/metadata_GABA.csv", row.names=1) mdata[1:5,1:5]

tsne <- read.csv("../data/tsne_GABA.csv", row.names=1) head(tsne)

the tsne data don't contain all cells that are present in counts & metadata

-> remove missing cells from counts data

keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)] counts <- subset(counts, , keepcells) build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )

Warning message:

In if (class(x = dgem) == "dgTMatrix") { :

the condition has length > 1 and only the first element will be used

-> you can ignore this warning

open the loom file to modify it

loom <- open_loom("test.loom", mode = "r+")

now add meta data

add_col_attr(loom = loom, key = "percent.mito",

value = mdata[colnames(counts), "percent.mito"], as.metric = T)

add_col_attr(loom = loom, key = "Cell_Line",

value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)

don't forget to close the loom file

close_loom(loom)

sessionInfo()

Errormessage:

Error in [.data.table(counts, , sample_name) : j (the 2nd argument inside [...]) is a single symbol but column name 'sample_name' is not found. Perhaps you intended DT[, ..sample_name]. This difference to data.frame is deliberate and explained in FAQ 1.1. Calls: [ -> [.data.table Execution halted

as you said you need to see the whole error message:

The downloaded source packages are in '/tmp/RtmpsdzVWN/downloaded_packages'

Attaching package: 'SCopeLoomR'

The following object is masked from 'package:base':

flush

Error in [.data.table(counts, , sample_name) : j (the 2nd argument inside [...]) is a single symbol but column name 'sample_n ame' is not found. Perhaps you intended DT[, ..sample_name]. This difference to data.frame is deliberate and explained in FAQ 1.1. Calls: [ -> [.data.table Execution halted


Please let me know, if I can give you more info about the dataset I used or if you need any other information! Thank you so much! Laura

tropfenameimer commented 3 years ago

hi laura,

the matrix.csv i downloaded contains a column named "sample_name". you are trying to run my code on another data set, and i guess your data frame "matrix_GABA.csv" does not contain such a column. hence the error message. can you please post an excerpt of all data frames you are using (e.g. counts[1:5,1:3])?

katina

WinklerLaura commented 3 years ago

Hi Katina,

I think I know now what I did wrong. These are the used datasets: [1] "matrix:" V1 V2 V3 1: AAAGTAGGTGAGGCTA-L8TX_180221_01_F09 9 0 2: ACCTTTATCGACAGCC-L8TX_180221_01_F09 13 0 3: ACCTTTATCTTCTGGC-L8TX_180221_01_F09 8 1 4: ACGAGCCAGATATGGT-L8TX_180221_01_F09 3 1 5: ACGAGCCAGCTGAACG-L8TX_180221_01_F09 8 0 [1] "metadata:" CCATGTCAGCGCTTAT.2L8TX_180221_01_C11 X10x TGGTTCCTCACGACTA-L8TX_180221_01_E11 TGGTTCCTCACGACTA-4L8TX_180221_01_E11 10x CTACGTCCAACTGGCC-L8TX_181011_01_B03 CTACGTCCAACTGGCC-6L8TX_181011_01_B03 10x CTAGAGTCACGCGAAA-L8TX_171026_01_B05 CTAGAGTCACGCGAAA-6L8TX_171026_01_B05 10x ATAGACCTCCCAAGAT-L8TX_180221_01_D09 ATAGACCTCCCAAGAT-4L8TX_180221_01_D09 10x GTTCGGGCAATCGAAA-L8TX_181206_01_B12 GTTCGGGCAATCGAAA-3L8TX_181206_01_B12 10x X.C60C0F X2 X2_Meis2 TGGTTCCTCACGACTA-L8TX_180221_01_E11 #8C5962 7 7_Lamp5 CTACGTCCAACTGGCC-L8TX_181011_01_B03 #8C5962 7 7_Lamp5 CTAGAGTCACGCGAAA-L8TX_171026_01_B05 #8C5962 7 7_Lamp5 ATAGACCTCCCAAGAT-L8TX_180221_01_D09 #8C5962 7 7_Lamp5 GTTCGGGCAATCGAAA-L8TX_181206_01_B12 #8C5962 7 7_Lamp5 [1] "tsne:" X5.02848805333333 X19.2537504.56 X2 TGGTTCCTCACGACTA-L8TX_180221_01_E11 -9.515233 4.96538642\t472 2 CTACGTCCAACTGGCC-L8TX_181011_01_B03 -10.224533 5.32504034666667\t473 2 CTAGAGTCACGCGAAA-L8TX_171026_01_B05 -10.942459 6.26142136666667\t474 2 ATAGACCTCCCAAGAT-L8TX_180221_01_D09 -8.635556 5.56036584\t475 2 GTTCGGGCAATCGAAA-L8TX_181206_01_B12 -9.835173 6.1383559\t476 2 GACGTGCCACACCGAC-L8TX_171026_01_G04 -9.510277 7.67432181333333\t477 2 R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

I filtered all three files for "GABAergic" in a specific cell (I think it was class_label of the metadata). But I've now realized, that of course it would filter out the first row (with all the headers like "sample_name") as well. So, if I correct the filtering (keep the headers), then it should work with the code from you (above) I guess. Sorry for the stupid mistake, I am really new to all this. Thank you so much for your help!! I will correct it immediately and tell you if it works then!

WinklerLaura commented 3 years ago

Hi, so I tried different things, as I still got some errors. Now, that I can be 100% sure, that it is not due to my preprocessing, I took all three files directly from the Allen Brain matrix and ran the script above, but it tells me " unexpected end of input"

R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.13.4 SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2 [5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6 [9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2 [13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1 [17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1 [21] callr_3.3.2 fs_1.3.1 ps_1.3.0 curl_4.2 [25] testthat_2.2.1 memoise_1.1.0 glue_1.3.1 compiler_3.5.3 [29] desc_1.2.0 backports_1.1.5 prettyunits_1.0.2 AAACCTGAGAAACGCC-L8TX_180221_01_F09 AAACCTGAGAAGGTTT-L8TX_180221_01_F09 Xkr4 8 13 Gm1992 0 2 Gm37381 0 0 Rp1 0 0 Sox17 0 0 AAACCTGAGAGTAATC-L8TX_180221_01_F09 Xkr4 8 Gm1992 0 Gm37381 0 Rp1 0 Sox17 0 exp_component_name ACTGAGTAGGTGACCA-L8TX_180221_01_C11 ACTGAGTAGGTGACCA-2L8TX_180221_01_C11 TCAACGACAGACAGGT-L8TX_180221_01_B11 TCAACGACAGACAGGT-1L8TX_180221_01_B11 CTCGTACTCACGACTA-L8TX_180221_01_C11 CTCGTACTCACGACTA-2L8TX_180221_01_C11 CACCAGGCAGGTCTCG-L8TX_180221_01_C11 CACCAGGCAGGTCTCG-2L8TX_180221_01_C11 GACTGCGCAAAGGTGC-L8TX_180221_01_C11 GACTGCGCAAAGGTGC-2L8TX_180221_01_C11 platform_label cluster_color cluster_order ACTGAGTAGGTGACCA-L8TX_180221_01_C11 10x #5E5765 355 TCAACGACAGACAGGT-L8TX_180221_01_B11 10x #5E5765 355 CTCGTACTCACGACTA-L8TX_180221_01_C11 10x #5E5765 355 CACCAGGCAGGTCTCG-L8TX_180221_01_C11 10x #5E5765 355 GACTGCGCAAAGGTGC-L8TX_180221_01_C11 10x #5E5765 355 cluster_label ACTGAGTAGGTGACCA-L8TX_180221_01_C11 355_V3d TCAACGACAGACAGGT-L8TX_180221_01_B11 355_V3d CTCGTACTCACGACTA-L8TX_180221_01_C11 355_V3d CACCAGGCAGGTCTCG-L8TX_180221_01_C11 355_V3d GACTGCGCAAAGGTGC-L8TX_180221_01_C11 355_V3d tsne_1 tsne_2 ACTGAGTAGGTGACCA-L8TX_180221_01_C11 5.027914 19.24911 TCAACGACAGACAGGT-L8TX_180221_01_B11 4.998868 19.24150 CTCGTACTCACGACTA-L8TX_180221_01_C11 5.027664 19.25188 CACCAGGCAGGTCTCG-L8TX_180221_01_C11 4.984898 19.24268 GACTGCGCAAAGGTGC-L8TX_180221_01_C11 5.024889 19.25244 CCATGTCAGCTGCCCA-L8TX_180221_01_C11 4.990352 19.24156 [1] "Adding global attributes..." [1] "Adding matrix..." |======================================================================| 100%[1] "Adding column attributes..." [1] "Adding default metrics nUMI..." [1] "Adding default metrics nGene..." [1] "Adding default embedding..." [1] "Adding row attributes..." [1] "Adding columns graphs..." [1] "Adding row graphs..." [1] "Adding layers..." Error: unexpected end of input Execution halted

but that is really weird, because you used the same matrix.csv file, right?

WinklerLaura commented 3 years ago

and I used this code:

Load libraries

library(devtools) # To install packages as SCopeLoomR devtools::install_github("aertslab/SCopeLoomR") install.packages("data.table") library(SCopeLoomR) # to convert the csv file into loom file library(data.table) # quicker data import with fread() than with read.csv

sessionInfo()

import counts matrix -> transpose to gene x cell fomrat

counts <- fread("../data/matrix.csv", sep = ",") cellnames <- counts[, sample_name] counts <- t(counts[,2:ncol(counts)]) colnames(counts) <- cellnames

counts[1:5,1:3]

mdata <- read.csv("../data/metadata.csv", row.names=1) mdata[1:5,1:5]

tsne <- read.csv("../data/tsne_million.csv", row.names=1) head(tsne)

the tsne data don't contain all cells that are present in counts & metadata

-> remove missing cells from counts data

keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)] counts <- subset(counts, , keepcells)

build_loom( file.name = "million_GABA_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )

Warning message:

In if (class(x = dgem) == "dgTMatrix") { :

the condition has length > 1 and only the first element will be used

open the loom file to modify it

loom <- open_loom("million_GABA_SCopeLoomR.loom", mode = "r+")

now add meta data

add_col_attr(loom = loom, key = "percent.mito",

value = mdata[colnames(counts), "percent.mito"], as.metric = T)

add_col_attr(loom = loom, key = "Cell_Line",

value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)

don't forget to close the loom file

close_loom(loom)

Does it have to do with the tSNE still, as it does not comprise all the counts? Is it possible to build the loom file with matrix and metadata without tSNE? (I don't necessarily need it). Then maybe it works? I will try that. Best, Laura

tropfenameimer commented 3 years ago

hi laura,

i didn't test it on the mouse data you want to use, because they are so large. instead, i downloaded a smaller human data set, hoping that the allen data are all similarly formatted: https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/matrix.csv https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/metadata.csv https://idk-etl-prod-download-bucket.s3.amazonaws.com/aibs_human_ctx_smart-seq/tsne.csv

so in the human data set, there was this tsne issue, but for the mouse data i don't know, of course. have you tried to run the script on the human data and does that work?

it is possible to build a loom file without tsne. but then you won't see anything in scope, which is not what you want, i guess. (by the way, you don't need to reinstall the packages every time; once you have installed them, you can just load them with library())

i will download the mouse data, when i have time.

katina

WinklerLaura commented 3 years ago

hi katina, no, I didn't try that, but I will do that now! (True, thank you for the hint!)

thank you so much!

WinklerLaura commented 3 years ago

Hi, I downloaded the 3 human datafiles and ran the code with these files, but I got the same error as with the mouse dataset: R version 3.5.3 (2019-03-11) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.0 LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.13.4 SCopeLoomR_0.10.2 devtools_2.2.1 usethis_1.5.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.5 magrittr_2.0.1 bit_4.0.4 pkgload_1.0.2 [5] R6_2.5.0 rlang_0.4.1 tools_3.5.3 pkgbuild_1.0.6 [9] sessioninfo_1.1.1 cli_1.1.0 hdf5r_1.3.3 withr_2.1.2 [13] ellipsis_0.3.0 remotes_2.1.0 bit64_4.0.5 assertthat_0.2.1 [17] digest_0.6.22 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.1 [21] callr_3.3.2 fs_1.3.1 ps_1.3.0 testthat_2.2.1 [25] memoise_1.1.0 glue_1.3.1 compiler_3.5.3 desc_1.2.0 [29] backports_1.1.5 prettyunits_1.0.2 F2S4_160113_027_A01 F2S4_160113_027_B01 F2S4_160113_027_C01 3.8-1.2 0 0 0 3.8-1.3 0 0 0 3.8-1.4 0 0 0 3.8-1.5 0 0 0 5-HT3C2 0 0 0 exp_component_name specimen_type cluster_color F2S4_160113_027_A01 LS-15005h_S01_E1-50 nucleus F2S4_160113_027_B01 LS-15005h_S02_E1-50 nucleus #E170FE F2S4_160113_027_C01 LS-15005h_S03_E1-50 nucleus #8E5864 F2S4_160113_027_D01 LS-15005h_S04_E1-50 nucleus #8B5862 F2S4_160113_027_E01 LS-15005h_S05_E1-50 nucleus #CF6EC9 cluster_order cluster_label F2S4_160113_027_A01 NA F2S4_160113_027_B01 32 Inh L2-5 VIP TOX2 F2S4_160113_027_C01 2 Inh L1 LAMP5 GGT8P F2S4_160113_027_D01 1 Inh L1 LAMP5 NDNF F2S4_160113_027_E01 34 Inh L1-3 VIP ZNF322P1 tsne_1 tsne_2 F2S4_160113_027_B01 32.15014 4.800442 F2S4_160113_027_C01 25.06135 -27.205498 F2S4_160113_027_D01 25.36504 -25.813437 F2S4_160113_027_E01 27.07534 13.713614 F2S4_160113_027_F01 27.66046 2.411001 F2S4_160113_027_G01 -8.20726 -33.097583 [1] "Adding global attributes..." [1] "Adding matrix..." |======================================================================| 100%[1] "Adding column attributes..." [1] "Adding default metrics nUMI..." [1] "Adding default metrics nGene..." [1] "Adding default embedding..." [1] "Adding row attributes..." [1] "Adding columns graphs..." [1] "Adding row graphs..." [1] "Adding layers..." Error: unexpected end of input Execution halted

code:

Load libraries

library(devtools) # To install packages as SCopeLoomR

devtools::install_github("aertslab/SCopeLoomR")

install.packages("data.table")

library(SCopeLoomR) # to convert the csv file into loom file library(data.table) # quicker data import with fread() than with read.csv

sessionInfo()

import counts matrix -> transpose to gene x cell fomrat

counts <- fread("../data/matrix_human.csv", sep = ",") cellnames <- counts[, sample_name] counts <- t(counts[,2:ncol(counts)]) colnames(counts) <- cellnames

counts[1:5,1:3]

mdata <- read.csv("../data/metadata_human.csv", row.names=1) mdata[1:5,1:5]

tsne <- read.csv("../data/tsne_human.csv", row.names=1) head(tsne)

the tsne data don't contain all cells that are present in counts & metadata

-> remove missing cells from counts data

keepcells <- rownames(tsne)[rownames(tsne) %in% colnames(counts)] counts <- subset(counts, , keepcells)

build_loom( file.name = "million_human_SCopeLoomR.loom", dgem = counts, title = "GABA cells", genome = "mouse", default.embedding = tsne[colnames(counts),], default.embedding.name = "tSNE" )

Warning message:

In if (class(x = dgem) == "dgTMatrix") { :

the condition has length > 1 and only the first element will be used

-> you can ignore this warning

open the loom file to modify it

loom <- open_loom("million_human_SCopeLoomR.loom", mode = "r+")

now add meta data

add_col_attr(loom = loom, key = "percent.mito",

value = mdata[colnames(counts), "percent.mito"], as.metric = T)

add_col_attr(loom = loom, key = "Cell_Line",

value = mdata[colnames(counts), "Cell_Line"], as.annotation = T)

don't forget to close the loom file

close_loom(loom)

if we are running the same code, with the same files and it is not working in my case, but it is working your case, what could cause this issue? Is it possible that it has something to do with the version of R? as I am using R 3.5.3? ( I am running it in a docker container, in case that is important)

WinklerLaura commented 3 years ago

It had nothing to do with the tSNE file. I tried it without getting the same error (and a warning)

[1] "Adding global attributes..." [1] "Adding matrix..." |======================================================================| 100%[1] "Adding column attributes..." [1] "Adding default metrics nUMI..." [1] "Adding default metrics nGene..." [1] "Adding row attributes..." [1] "Adding columns graphs..." [1] "Adding row graphs..." [1] "Adding layers..." Warning message: In doTryCatch(return(expr), name, parentenv, handler) : No default embedding set for the loom. You'll not be able to visualize it in SCope. Error: unexpected end of input Execution halted

tropfenameimer commented 3 years ago

ok, i have made a loom file with the GABAergic cells only: http://scope.aertslab.org/#/22b5f09c-ab5d-4395-a171-c6a3fec4daee/ (go to: user uploaded -> uncategorized) when you type 'label' in the gene box, you can select annotations to colour the plot. please download the loom file (small disk symbol next loom file), the session will expire in a few days.

the full data set was too big for my computer. i'd have to try on a different machine. maybe that's why it doesn't work for you.

WinklerLaura commented 3 years ago

Thank you so so much!!! That is actually the reason why I am working on a big server, but maybe it is still getting too big when converting into a loom file. Thank you so much for your help!!