GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
382 stars 137 forks source link

Error when loading in custom peak annotation #479

Closed cnk113 closed 3 years ago

cnk113 commented 3 years ago

ArchR-addPeakAnnotations-165f957dadfc2-Date-2020-12-28_Time-02-49-15.log

I'm trying to load in my peak calls using addPeakAnnotations but I end up getting this error:

Error in Matrix::sparseMatrix(i = queryHits(overlapRegions), j = match(names(allPositions),  : 
  NA's in (i,j) are not allowed

The granges object has the bare minimum with a score for the peak. It's the summit file of a MACS2 run.

rcorces commented 3 years ago

In the future, please use the issue templates that have been provided.

I'm trying to load in my peak calls using addPeakAnnotations

It sounds like you are using the wrong function: https://www.archrproject.com/reference/addPeakAnnotations.html

I think you want: https://www.archrproject.com/reference/addPeakSet.html

cnk113 commented 3 years ago

Sorry, I want to do custom deviations with my peak set. This is cut&tag data from a bulk sample. I'm pretty much trying to recreate what is on the manual on custom deviations with these peaks. It looks like that function overrides my current peak set which is what I'm not trying to do.

rcorces commented 3 years ago

Please provide more details. Please reference the portion of the manual that you are trying to reproduce. Show what your GRanges object(s) and your ArchR peak set look like.

cnk113 commented 3 years ago

The portion from the manual. At the bottom for loading the custom set. Granges object looks like this for the custom peak set:

GRanges object with 6 ranges and 1 metadata column:
      seqnames        ranges strand |      score
         <Rle>     <IRanges>  <Rle> |  <numeric>
  [1]     chr1   10201-10202      * | 1016.36902
  [2]     chr1   29121-29122      * |   82.17918
  [3]     chr1 180779-180780      * |  143.97285
  [4]     chr1 199676-199677      * |   92.60435
  [5]     chr1 631126-631127      * |   14.31855
  [6]     chr1 632035-632036      * |   48.81945

For the scATAC-seq peak set:

GRanges object with 6 ranges and 12 metadata columns:
        seqnames        ranges strand |     score replicateScoreQuantile groupScoreQuantile Reproducibility
           <Rle>     <IRanges>  <Rle> | <numeric>              <numeric>          <numeric>       <numeric>
  Unk_2     chr1 797708-798208      * |   2.59522                  0.579              0.579               2
  Unk_2     chr1 804686-805186      * |   6.32581                  0.881              0.881               2
  CCL_2     chr1 807679-808179      * |    3.0609                  0.156              0.156            <NA>
  CXCL8     chr1 817120-817620      * |  23.20684                  0.907              0.907            <NA>
  Unk_2     chr1 824768-825268      * |   2.59522                  0.579              0.579               2
  CCL_2     chr1 826696-827196      * |   4.58737                  0.367              0.367            <NA>
        GroupReplicate distToGeneStart nearestGene    peakType distToTSS  nearestTSS        GC       idx
           <character>       <integer> <character> <character> <integer> <character> <numeric> <integer>
  Unk_2   Unk_2._.Rep1           19412      FAM87B      Distal      6840  uc057aum.1    0.4291         1
  Unk_2   Unk_2._.Rep1           12434      FAM87B      Distal         3  uc057aum.1    0.4012         2
  CCL_2   CCL_2._.Rep1            9441      FAM87B      Distal      2996  uc057aum.1    0.4212         3
  CXCL8   CXCL8._.Rep1               0      FAM87B    Promoter         0  uc057aum.1     0.485         4
  Unk_2   Unk_2._.Rep1             119   LINC01128    Promoter       119  uc057auo.1    0.3952         5
  CCL_2   CCL_2._.Rep1             575   LINC00115      Exonic       113  uc031tlo.2    0.6088         6
rcorces commented 3 years ago

I'm not exactly sure what is going on. Maybe its a bug with only using a single GRanges object as input? Can you try:

1) Using the ENCODE ChIP seq peaks that are used in the manual 2) Using your GRanges object and adding in a second random GRanges object

jgranja24 commented 3 years ago

Hi @cnk113,

I believe your issue is because you are not using as input a GRangesList with names

#' @param regions AlistofGRangesthat are to be overlapped with thepeakSetin theArchRProject.

addPeakAnnotations Input-Parameters$regions: length = 1
GRangesList object of length 1:
[[1]]
GRanges object with 38089 ranges and 1 metadata column:
          seqnames            ranges strand |      score
             <Rle>         <IRanges>  <Rle> |  <numeric>
      [1]     chr1       10201-10202      * | 1016.36902
      [2]     chr1       29121-29122      * |   82.17918
      [3]     chr1     180779-180780      * |  143.97285
      [4]     chr1     199676-199677      * |   92.60435
      [5]     chr1     631126-631127      * |   14.31855
      ...      ...               ...    ... .        ...
  [38085]     chrY 11312535-11312536      * |   29.51248
  [38086]     chrY 11313456-11313457      * |   10.43217
  [38087]     chrY 11329965-11329966      * |   14.10791
  [38088]     chrY 11333437-11333438      * |   13.06451
  [38089]     chrY 56850900-56850901      * |   16.30159
  -------
  seqinfo: 52 sequences from an unspecified genome; no seqlengths

Please try adding a name, ienames(regions) <- c("CUTTAG")prior to input. The error is because there is no names and thus the column names would be nothing.

For future reference i added a workaround in release_1.0.1

    if(is.null(names(regions))){
      names(regions) <- paste0("Region_", seq_along(regions))
    }
cnk113 commented 3 years ago

Yeah it works now, thanks!