Closed drighelli closed 3 years ago
Hi Dario, @drighelli
1) The main input for the RaggedExperiment
is a GRanges
/ GRangesList
with unequal measurements for each sample.
2) If you are only working with one sample OR if you have multiple samples with uniform measurements,
it may be better to use a RangedSummarizedExperiment
/ SingleCellExperiment
.
If you still have case number 1, the input would be GRangesList
and then you could use either sparseAssay
or compactAssay
to get a dgCMatrix
output.
Hi Marcel @LiNk-NY ,
thanks for your reply. I have indeed multiple samples, but I have a sparse matrix for each of them, so I'd like to store them.
Is that possible?
Hi Dario, @drighelli
Interesting. How is your data stored? File type?
We don't have a coercion function to go from sparse matrix to RaggedExperiment
but that could be a way to do it.
here is an example of the subset of the data, I already have the data separately stored as GRanges and as sparse Matrix
rang.sub <- seu@assays$ATAC@ranges[1:10]
> rang.sub
GRanges object with 10 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 9790-10676 *
[2] chr1 180654-181318 *
[3] chr1 191155-192066 *
[4] chr1 267573-268458 *
[5] chr1 270881-271760 *
[6] chr1 585751-586643 *
[7] chr1 629500-630394 *
[8] chr1 633579-634475 *
[9] chr1 778287-779202 *
[10] chr1 816875-817771 *
-------
seqinfo: 36 sequences from an unspecified genome; no seqlengths
> mat.sub <- seu@assays$ATAC@counts[1:10,1:10]
> mat.sub
10 x 10 sparse Matrix of class "dgCMatrix"
[[ suppressing 10 column names ‘AAACAGCCAGAATGAC-1’, ‘AAACAGCCAGCTACGT-1’, ‘AAACAGCCAGGCCTTG-1’ ... ]]
chr1-9790-10676 . . . . . . . . . .
chr1-180654-181318 . . . . . . . . . .
chr1-191155-192066 . . . . . . . . . .
chr1-267573-268458 . . . . . . . . . .
chr1-270881-271760 . . . . . . . . . .
chr1-585751-586643 . . . . . . . . . .
chr1-629500-630394 2 . . . . . . . . .
chr1-633579-634475 4 . 2 4 2 . . 2 2 2
chr1-778287-779202 2 . 2 2 . . 2 2 . .
chr1-816875-817771 . . . . . . . . . .
This looks like a SingleCellExperiment
/ RangedSummarizedExperiment
the way it is being represented now.
If you had a GRangesList
, it would be easier to convert. I will look into this more.
Hi Dario, @drighelli
I've added a coercion method from dgCMatrix
to RaggedExperiment
. Let me know how it goes.
39dcae3ad3009f5f9cf0b8c8ae3b4be0b035a71f
Version 1.17.2
Thanks Marcel, @LiNk-NY
I've tested the coercion and it's not working as expected, the coercion works, but the ranges are not properly imported.
This is my input sparse matrix:
> subcounts
10 x 10 sparse Matrix of class "dgCMatrix"
[[ suppressing 10 column names ‘AAACAGCCAGAATGAC-1’, ‘AAACAGCCAGCTACGT-1’, ‘AAACAGCCAGGCCTTG-1’ ... ]]
chr1:9790-10676 . . . . . . . . . .
chr1:180654-181318 . . . . . . . . . .
chr1:191155-192066 . . . . . . . . . .
chr1:267573-268458 . . . . . . . . . .
chr1:270881-271760 . . . . . . . . . .
chr1:585751-586643 . . . . . . . . . .
chr1:629500-630394 2 . . . . . . . . .
chr1:633579-634475 4 . 2 4 2 . . 2 2 2
chr1:778287-779202 2 . 2 2 . . 2 2 . .
chr1:816875-817771 . . . . . . . . . .
This is my code and output:
> ragexp <- as(subcounts, "RaggedExperiment")
> assay(ragexp)
AAACAGCCAGAATGAC-1 AAACAGCCAGGCCTTG-1 AAACATGCAGCAATAA-1 AAACATGCAGCCAGAA-1
chr1:629500-630394 2 NA NA NA
chr1:633579-634475 4 NA NA NA
chr1:778287-779202 2 NA NA NA
chr1:633579-634475 NA 2 NA NA
chr1:778287-779202 NA 2 NA NA
chr1:633579-634475 NA NA 4 NA
chr1:778287-779202 NA NA 2 NA
chr1:633579-634475 NA NA NA 2
chr1:778287-779202 NA NA NA NA
chr1:633579-634475 NA NA NA NA
chr1:778287-779202 NA NA NA NA
chr1:633579-634475 NA NA NA NA
chr1:633579-634475 NA NA NA NA
AAACATGCAGTTTCTC-1 AAACCAACAACTAGGG-1 AAACCAACAATAACCT-1 AAACCAACACTTAGGC-1
chr1:629500-630394 NA NA NA NA
chr1:633579-634475 NA NA NA NA
chr1:778287-779202 NA NA NA NA
chr1:633579-634475 NA NA NA NA
chr1:778287-779202 NA NA NA NA
chr1:633579-634475 NA NA NA NA
chr1:778287-779202 NA NA NA NA
chr1:633579-634475 NA NA NA NA
chr1:778287-779202 2 NA NA NA
chr1:633579-634475 NA 2 NA NA
chr1:778287-779202 NA 2 NA NA
chr1:633579-634475 NA NA 2 NA
chr1:633579-634475 NA NA NA 2
> rowRanges(ragexp)
GRanges object with 13 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 629500-630394 *
[2] chr1 633579-634475 *
[3] chr1 778287-779202 *
[4] chr1 633579-634475 *
[5] chr1 778287-779202 *
... ... ... ...
[9] chr1 778287-779202 *
[10] chr1 633579-634475 *
[11] chr1 778287-779202 *
[12] chr1 633579-634475 *
[13] chr1 633579-634475 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
as you can see, I have 10 ranges in the matrix input, but 13 in the RaggedExperiment. Also there seem to be some repetitions...
Additionally, it would be useful to recognize the ranges in the chr-start-end in addition to the actual chr:start-end. (But this is a really really minor thing).
Thanks again, hope this testing could be useful :)
Hi Dario, @drighelli
That looks correct to me. The rowRanges
function shows the unlisted ranges from all the samples so there will be repetitions.
You have to use compactAssay(ragex, sparse = TRUE)
to get a similar representation.
The representation only keeps non-empty rows and columns.
Additionally, it would be useful to recognize the ranges in the chr-start-end in addition to the actual chr:start-end. (But this is a really really minor thing).
For this, we are using the GRanges
character constructor. If you'd like it to be supported, please open an issue at @Bioconductor/GenomicRanges. For example:
> GRanges("chr1-1-10")
Error in asMethod(object) :
The character vector to convert to a GRanges object must contain
strings of the form "chr:start-end" or "chr:start-end:strand", with end
>= start - 1, or "chr:pos" or "chr:pos:strand". For example:
"chr1:2501-2900", "chr1:2501-2900:+", or "chr1:740". Note that ".." is
a valid alternate start/end separator. Strand can be "+", "-", "*", or
missing.
Best, Marcel
oh I see!
Thanks again Marcel!
Also I renamed scores
mcols to counts
... e60c883
1.17.3
Hi, I'm trying to use this class for an ATAC-seq single-cell experiment, which means that I have a (sparse)count matrix and a list of regions.
Here is an example of the regions I have
And when I build the
RaggedExperiment
, I obtain these twoassays
, that are coming from theelementMetadata
of theGRange
, which I'm not interested in, because they are just metadataSo, I'm not sure I'm rightly understanding the
assays
of this class, because there are no dedicated examples or a vignette dedicated section.But I would expect to collect one or more count matrices through the classic
assays=List(counts=myMatrix, sparseCounts=mydgCMatrix)
in the object constructor.I've also tried to do an
assay
assignment but obtaining the following resultSo, what I'm saying is that (in case this class is thought to be used for single-cell data) maybe would be preferable to have a more classic approach for the construction of the object and support for sparse matrices.
Thanks for any clarification :)