Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

GRanges Error: "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent" #47

Closed ahorn720 closed 3 years ago

ahorn720 commented 3 years ago

Im trying to create a Granges object from a data frame. Here is my data frame:

> head(cna[,1:3],15)
   chrom loc.start   loc.end
1      1         1  60278618
2      1  60278619  60280056
3      1  60280057 122661020
4      1 122661021 124971287
5      1 124971288 125073244
6      1 125073245 125116434
7      1 125116435 234775098
8      1 234775099 234802730
9      1 234802731 248956422
10     2         1  89349774
11     2  89349775  94340332
12     2  94340333 242193529
13     3         1  90718825
14     3  90718826  91226754
15     3  91226755 188755202

I can make a a Granges object with 11 rows:

>     i = 1
>     j = 11
>     gr <- GRanges(seqnames=Rle(cna$chrom[i:j]),
+                   ranges=IRanges(cna$loc.start[i:j],
+                                  cna$loc.end[i:j]),
+                   strand="*") # Turn intoo GRanges object
>     gr
GRanges object with 11 ranges and 0 metadata columns:
       seqnames              ranges strand
          <Rle>           <IRanges>  <Rle>
   [1]        1          1-60278618      *
   [2]        1   60278619-60280056      *
   [3]        1  60280057-122661020      *
   [4]        1 122661021-124971287      *
   [5]        1 124971288-125073244      *
   [6]        1 125073245-125116434      *
   [7]        1 125116435-234775098      *
   [8]        1 234775099-234802730      *
   [9]        1 234802731-248956422      *
  [10]        2          1-89349774      *
  [11]        2   89349775-94340332      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

but once I add another row, up to 12, it throws an error:

>     j = 12
>     gr <- GRanges(seqnames=Rle(cna$chrom[i:j]),
+                   ranges=IRanges(cna$loc.start[i:j],
+                                  cna$loc.end[i:j]),
+                   strand="*") # Turn intoo GRanges object
>     gr
GRanges object with 12 ranges and 0 metadata columns:
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent

Im only showing a small portion of the larger problem...I want to make the first 3 columns of cna into a Granges object but I found that these row errors were happening.

LiNk-NY commented 3 years ago

Hi @ahorn720, Can you provide a minimally reproducible example? Also, what is the return of BiocManager;:valid() and BiocManager::version()? I would recommend you use makeGRangesFromDataFrame.

ahorn720 commented 3 years ago

Thank you for the quick response. I spent far tooo long trying to figure this out. I’ve attached the cna dataframe from the op.

Let me know if you need me to send it in a different format though.

Here is the code I was running which gave me the error: gr <- GRanges(seqnames=paste0("chr",cna$chrom), ranges=IRanges(cna$loc.start, cna$loc.end), strand="*”)

gr <- GRanges(seqnames=paste0("chr",cna$chrom),
  • ranges=IRanges(cna$loc.start,
  • cna$loc.end),
  • strand="*") # Turn intoo GRanges object gr GRanges object with 125 ranges and 0 metadata columns: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

BiocManager::version() [1] ‘3.11’

I’ve attached the bcocmanager::valid() output.

Thank you!

Aaron

On Nov 16, 2020, at 6:25 PM, Marcel Ramos notifications@github.com wrote:

Hi @ahorn720 https://github.com/ahorn720, Can you provide a minimally reproducible example? Also, what is the return of BiocManager;:valid() and BiocManager::version()? I would recommend you use makeGRangesFromDataFrame.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/GenomicRanges/issues/47#issuecomment-728644734, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY3HYZY47XJPQRYH5FP7VDSQHNLBANCNFSM4TX3PZMQ.

LiNk-NY commented 3 years ago

Hi Aaron, @ahorn720

AFAIK attaching files does not work if you're responding via email. Please post the reproducible example here using reprex::reprex. If you have a small dataset to share, try using either dput or a data.frame() call in the reprex.

I would recommend that you update your R version to >= 4.0. My hunch is that there is an issue with the data rather than GenomicRanges which is a pretty robust package. I'm looking forward to the reproducible example.

Best, Marcel

ahorn720 commented 3 years ago

Oh! I didn't see that this was on the GitHub thread! My apologies.

Well after I ran the BiocManager::valid() and saw that there was a recommended install, I ran that and refreshed the R session and now the code works.

library(GenomicRanges)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
cna = readr::read_rds("~/Desktop/cna_sample.rds")
head(cna,20)
#>    chrom loc.start   loc.end num.mark seg.mean copynumber minor_cn major_cn
#> 1      1         1  60278618    38700    0.013          2        1        1
#> 2      1  60278619  60280056        1    0.100          6        1        5
#> 3      1  60280057 122661020    46080   -0.019          2        1        1
#> 4      1 122661021 124971287      117    0.077          3        1        2
#> 5      1 124971288 125073244       78    0.006          2        1        1
#> 6      1 125073245 125116434        5    0.026          4        1        3
#> 7      1 125116435 234775098    53820   -0.006          2        1        1
#> 8      1 234775099 234802730       26   -0.023          3        1        2
#> 9      1 234802731 248956422     7841    0.008          2        1        1
#> 10     2         1  89349774    61171   -0.005          2        1        1
#> 11     2  89349775  94340332      307    0.071          3        1        2
#> 12     2  94340333 242193529    90515   -0.009          2        1        1
#> 13     3         1  90718825    65206   -0.008          2        1        1
#> 14     3  90718826  91226754       43    0.212          3        1        2
#> 15     3  91226755 188755202    63592   -0.013          2        1        1
#> 16     3 188755203 188756456        1    0.200          7        1        6
#> 17     3 188756457 195646798     5708   -0.011          2        1        1
#> 18     3 195646799 195723594       76    0.032          3        1        2
#> 19     3 195723595 198295559     1838    0.011          2        1        1
#> 20     4         1  49363652    33206   -0.006          2        1        1
#>    allelicratio LOHcall cellularprevalence ploidy normalproportion
#> 1         0.580     HET                 NA  2.004                0
#> 2         0.970   ASCNA                  1  2.004                0
#> 3         0.583     HET                 NA  2.004                0
#> 4         0.715    GAIN                  1  2.004                0
#> 5         0.586     HET                 NA  2.004                0
#> 6         0.826   ASCNA                  1  2.004                0
#> 7         0.582     HET                 NA  2.004                0
#> 8         0.719    GAIN                  1  2.004                0
#> 9         0.582     HET                 NA  2.004                0
#> 10        0.582     HET                 NA  2.004                0
#> 11        0.712    GAIN                  1  2.004                0
#> 12        0.583     HET                 NA  2.004                0
#> 13        0.583     HET                 NA  2.004                0
#> 14        0.697    GAIN                  1  2.004                0
#> 15        0.583     HET                 NA  2.004                0
#> 16        0.970   ASCNA                  1  2.004                0
#> 17        0.583     HET                 NA  2.004                0
#> 18        0.735    GAIN                  1  2.004                0
#> 19        0.582     HET                 NA  2.004                0
#> 20        0.582     HET                 NA  2.004                0
#>    logcopynumberratio seg.mean.adj
#> 1               0.000        0.016
#> 2               1.585        0.103
#> 3               0.000       -0.016
#> 4               0.585        0.080
#> 5               0.000        0.009
#> 6               1.000        0.029
#> 7               0.000       -0.003
#> 8               0.585       -0.020
#> 9               0.000        0.011
#> 10              0.000       -0.002
#> 11              0.585        0.074
#> 12              0.000       -0.006
#> 13              0.000       -0.005
#> 14              0.585        0.215
#> 15              0.000       -0.010
#> 16              1.807        0.203
#> 17              0.000       -0.008
#> 18              0.585        0.035
#> 19              0.000        0.014
#> 20              0.000       -0.003

  gr <- GRanges(seqnames=cna$chrom,
              ranges=IRanges(cna$loc.start,
                             cna$loc.end),
              strand="*") # Turn intoo GRanges object
gr
#> GRanges object with 125 ranges and 0 metadata columns:
#>         seqnames              ranges strand
#>            <Rle>           <IRanges>  <Rle>
#>     [1]        1          1-60278618      *
#>     [2]        1   60278619-60280056      *
#>     [3]        1  60280057-122661020      *
#>     [4]        1 122661021-124971287      *
#>     [5]        1 124971288-125073244      *
#>     ...      ...                 ...    ...
#>   [121]       21   12985089-46709983      *
#>   [122]       22          1-11298106      *
#>   [123]       22   11298107-11462963      *
#>   [124]       22   11462964-15965511      *
#>   [125]       22   15965512-50818468      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome; no seqlengths

BiocManager::valid()
#> Warning: 1 packages out-of-date; 1 packages too new
#> 
#> * sessionInfo()
#> 
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#> [1] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2  IRanges_2.22.2      
#> [4] S4Vectors_0.26.1     BiocGenerics_0.34.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.30             XVector_0.28.0         magrittr_1.5          
#>  [4] hms_0.5.3              zlibbioc_1.34.0        R6_2.5.0              
#>  [7] rlang_0.4.8            stringr_1.4.0          highr_0.8             
#> [10] tools_4.0.2            xfun_0.19              ellipsis_0.3.1        
#> [13] htmltools_0.5.0        yaml_2.2.1             digest_0.6.27         
#> [16] tibble_3.0.4           lifecycle_0.2.0        crayon_1.3.4          
#> [19] GenomeInfoDbData_1.2.3 BiocManager_1.30.10    readr_1.4.0           
#> [22] vctrs_0.3.4            fs_1.5.0               bitops_1.0-6          
#> [25] RCurl_1.98-1.2         evaluate_0.14          rmarkdown_2.5         
#> [28] reprex_0.3.0.9001      stringi_1.5.3          pillar_1.4.6          
#> [31] compiler_4.0.2         pkgconfig_2.0.3       
#> 
#> Bioconductor version '3.11'
#> 
#>   * 1 packages out-of-date
#>   * 1 packages too new
#> 
#> create a valid installation with
#> 
#>   BiocManager::install(c(
#>     "RcppArmadillo", "reprex"
#>   ), update = TRUE, ask = FALSE)
#> 
#> more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Created on 2020-11-16 by the reprex package (v0.3.0.9001)

I think it had something to do with IRanges needing to be updated or something.

Thank you very much for guiding me towards the answer!

ahorn720 commented 3 years ago

and for introducing me to reprex::reprex()! very cool

LiNk-NY commented 3 years ago

Hi Aaron, @ahorn720 I'm glad it worked out! Note. I would still use makeGRangesFromDataFrame possibly with keep.extra.columns=TRUE to not lose any data. Also, RaggedExperiment for representing multiple samples with 'ragged' measurements. Best, Marcel