Bioconductor / RaggedExperiment

Matrix-like representations of mutation and CN data
https://bioconductor.org/packages/RaggedExperiment
4 stars 3 forks source link

assays usage and constructor #27

Closed drighelli closed 3 years ago

drighelli commented 3 years ago

Hi, I'm trying to use this class for an ATAC-seq single-cell experiment, which means that I have a (sparse)count matrix and a list of regions.

Here is an example of the regions I have

GRanges object with 111857 ranges and 4 metadata columns:
             seqnames        ranges strand |     count percentile     
                <Rle>     <IRanges>  <Rle> | <numeric>  <numeric> 
       [1]       chr1    9790-10676      * |       242  0.3323529
       [2]       chr1 180654-181318      * |       220  0.3004193
       [3]       chr1 191155-192066      * |       139  0.1566107
       [4]       chr1 267573-268458      * |       357  0.4609814
       [5]       chr1 270881-271760      * |       104  0.0855557

And when I build the RaggedExperiment, I obtain these two assays, that are coming from the elementMetadata of the GRange, which I'm not interested in, because they are just metadata

> ragexp
class: RaggedExperiment 
dim: 111857 1 
assays(2): count percentile
rownames: NULL
colnames: NULL
colData names(0):

So, I'm not sure I'm rightly understanding the assays of this class, because there are no dedicated examples or a vignette dedicated section.

But I would expect to collect one or more count matrices through the classic assays=List(counts=myMatrix, sparseCounts=mydgCMatrix) in the object constructor.

I've also tried to do an assay assignment but obtaining the following result

> assay(ragexp, withDimnames=FALSE)=seu@assays$ATAC@counts
> ragexp
class: RaggedExperiment 
dim: 111857 1 
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘mcols’ for signature ‘"dgCMatrix"’

So, what I'm saying is that (in case this class is thought to be used for single-cell data) maybe would be preferable to have a more classic approach for the construction of the object and support for sparse matrices.

Thanks for any clarification :)

LiNk-NY commented 3 years ago

Hi Dario, @drighelli

1) The main input for the RaggedExperiment is a GRanges / GRangesList with unequal measurements for each sample.

2) If you are only working with one sample OR if you have multiple samples with uniform measurements, it may be better to use a RangedSummarizedExperiment / SingleCellExperiment.

If you still have case number 1, the input would be GRangesList and then you could use either sparseAssay or compactAssay to get a dgCMatrix output.

drighelli commented 3 years ago

Hi Marcel @LiNk-NY ,

thanks for your reply. I have indeed multiple samples, but I have a sparse matrix for each of them, so I'd like to store them.

Is that possible?

LiNk-NY commented 3 years ago

Hi Dario, @drighelli

Interesting. How is your data stored? File type? We don't have a coercion function to go from sparse matrix to RaggedExperiment but that could be a way to do it.

drighelli commented 3 years ago

here is an example of the subset of the data, I already have the data separately stored as GRanges and as sparse Matrix

rang.sub <- seu@assays$ATAC@ranges[1:10]
> rang.sub
GRanges object with 10 ranges and 0 metadata columns:
       seqnames        ranges strand
          <Rle>     <IRanges>  <Rle>
   [1]     chr1    9790-10676      *
   [2]     chr1 180654-181318      *
   [3]     chr1 191155-192066      *
   [4]     chr1 267573-268458      *
   [5]     chr1 270881-271760      *
   [6]     chr1 585751-586643      *
   [7]     chr1 629500-630394      *
   [8]     chr1 633579-634475      *
   [9]     chr1 778287-779202      *
  [10]     chr1 816875-817771      *
  -------
  seqinfo: 36 sequences from an unspecified genome; no seqlengths
> mat.sub <- seu@assays$ATAC@counts[1:10,1:10]
> mat.sub
10 x 10 sparse Matrix of class "dgCMatrix"
   [[ suppressing 10 column names ‘AAACAGCCAGAATGAC-1’, ‘AAACAGCCAGCTACGT-1’, ‘AAACAGCCAGGCCTTG-1’ ... ]]

chr1-9790-10676    . . . . . . . . . .
chr1-180654-181318 . . . . . . . . . .
chr1-191155-192066 . . . . . . . . . .
chr1-267573-268458 . . . . . . . . . .
chr1-270881-271760 . . . . . . . . . .
chr1-585751-586643 . . . . . . . . . .
chr1-629500-630394 2 . . . . . . . . .
chr1-633579-634475 4 . 2 4 2 . . 2 2 2
chr1-778287-779202 2 . 2 2 . . 2 2 . .
chr1-816875-817771 . . . . . . . . . .
LiNk-NY commented 3 years ago

This looks like a SingleCellExperiment / RangedSummarizedExperiment the way it is being represented now. If you had a GRangesList, it would be easier to convert. I will look into this more.

LiNk-NY commented 3 years ago

Hi Dario, @drighelli I've added a coercion method from dgCMatrix to RaggedExperiment. Let me know how it goes. 39dcae3ad3009f5f9cf0b8c8ae3b4be0b035a71f Version 1.17.2

drighelli commented 3 years ago

Thanks Marcel, @LiNk-NY

I've tested the coercion and it's not working as expected, the coercion works, but the ranges are not properly imported.

This is my input sparse matrix:

> subcounts
10 x 10 sparse Matrix of class "dgCMatrix"
   [[ suppressing 10 column names ‘AAACAGCCAGAATGAC-1’, ‘AAACAGCCAGCTACGT-1’, ‘AAACAGCCAGGCCTTG-1’ ... ]]

chr1:9790-10676    . . . . . . . . . .
chr1:180654-181318 . . . . . . . . . .
chr1:191155-192066 . . . . . . . . . .
chr1:267573-268458 . . . . . . . . . .
chr1:270881-271760 . . . . . . . . . .
chr1:585751-586643 . . . . . . . . . .
chr1:629500-630394 2 . . . . . . . . .
chr1:633579-634475 4 . 2 4 2 . . 2 2 2
chr1:778287-779202 2 . 2 2 . . 2 2 . .
chr1:816875-817771 . . . . . . . . . .

This is my code and output:

> ragexp <- as(subcounts, "RaggedExperiment")
> assay(ragexp)
                   AAACAGCCAGAATGAC-1 AAACAGCCAGGCCTTG-1 AAACATGCAGCAATAA-1 AAACATGCAGCCAGAA-1
chr1:629500-630394                  2                 NA                 NA                 NA
chr1:633579-634475                  4                 NA                 NA                 NA
chr1:778287-779202                  2                 NA                 NA                 NA
chr1:633579-634475                 NA                  2                 NA                 NA
chr1:778287-779202                 NA                  2                 NA                 NA
chr1:633579-634475                 NA                 NA                  4                 NA
chr1:778287-779202                 NA                 NA                  2                 NA
chr1:633579-634475                 NA                 NA                 NA                  2
chr1:778287-779202                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
chr1:778287-779202                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
                   AAACATGCAGTTTCTC-1 AAACCAACAACTAGGG-1 AAACCAACAATAACCT-1 AAACCAACACTTAGGC-1
chr1:629500-630394                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
chr1:778287-779202                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
chr1:778287-779202                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
chr1:778287-779202                 NA                 NA                 NA                 NA
chr1:633579-634475                 NA                 NA                 NA                 NA
chr1:778287-779202                  2                 NA                 NA                 NA
chr1:633579-634475                 NA                  2                 NA                 NA
chr1:778287-779202                 NA                  2                 NA                 NA
chr1:633579-634475                 NA                 NA                  2                 NA
chr1:633579-634475                 NA                 NA                 NA                  2

> rowRanges(ragexp)
GRanges object with 13 ranges and 0 metadata columns:
       seqnames        ranges strand
          <Rle>     <IRanges>  <Rle>
   [1]     chr1 629500-630394      *
   [2]     chr1 633579-634475      *
   [3]     chr1 778287-779202      *
   [4]     chr1 633579-634475      *
   [5]     chr1 778287-779202      *
   ...      ...           ...    ...
   [9]     chr1 778287-779202      *
  [10]     chr1 633579-634475      *
  [11]     chr1 778287-779202      *
  [12]     chr1 633579-634475      *
  [13]     chr1 633579-634475      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

as you can see, I have 10 ranges in the matrix input, but 13 in the RaggedExperiment. Also there seem to be some repetitions...

Additionally, it would be useful to recognize the ranges in the chr-start-end in addition to the actual chr:start-end. (But this is a really really minor thing).

Thanks again, hope this testing could be useful :)

LiNk-NY commented 3 years ago

Hi Dario, @drighelli

That looks correct to me. The rowRanges function shows the unlisted ranges from all the samples so there will be repetitions.

You have to use compactAssay(ragex, sparse = TRUE) to get a similar representation. The representation only keeps non-empty rows and columns.

Additionally, it would be useful to recognize the ranges in the chr-start-end in addition to the actual chr:start-end. (But this is a really really minor thing).

For this, we are using the GRanges character constructor. If you'd like it to be supported, please open an issue at @Bioconductor/GenomicRanges. For example:

> GRanges("chr1-1-10")
Error in asMethod(object) : 
  The character vector to convert to a GRanges object must contain
  strings of the form "chr:start-end" or "chr:start-end:strand", with end
  >= start - 1, or "chr:pos" or "chr:pos:strand". For example:
  "chr1:2501-2900", "chr1:2501-2900:+", or "chr1:740". Note that ".." is
  a valid alternate start/end separator. Strand can be "+", "-", "*", or
  missing.

Best, Marcel

drighelli commented 3 years ago

oh I see!

Thanks again Marcel!

LiNk-NY commented 3 years ago

Also I renamed scores mcols to counts... e60c883 1.17.3