Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment #76

Open AntoniaChroni opened 1 year ago

AntoniaChroni commented 1 year ago

Hey,

I am getting this error when I am using reduce function on a GRangesList (it was working fine a few days ago).

Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

My script is: peaks = GRangesList() combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

And the output looks like this:

peaks GRangesList object of length 11: $sample1 GRanges object with 98860 ranges and 0 metadata columns: seqnames ranges strand

[1] chr1 629770-630550 * [2] chr1 633794-634259 * [3] chr1 778203-779262 * [4] chr1 817083-817625 * [5] chr1 821110-821501 * ... ... ... ... [98856] chrY 56868863-56871397 * [98857] chrY 56872276-56872281 * [98858] chrY 56873716-56874522 * [98859] chrY 56877566-56877837 * [98860] chrY 56879680-56880015 * ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths

...

<10 more elements> Any thoughts on what is happening? Thanks!
hpages commented 1 year ago

Hi @AntoniaChroni,

My script is: peaks = GRangesList() combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

Is this really your script? It doesn't make much sense because with your code peaks is an empty GRangesList object with no sample1, or sample2, or sample3 list element. So it is expected to fail, but not with the same error as the one you got:

> peaks = GRangesList()
> combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘reduce’ for signature ‘"NULL"’

Please provide:

  1. A self-contained reproducible example that actually produces the error you got.
  2. Your sessionInfo().

Thanks, H.

AntoniaChroni commented 1 year ago

This is the whole script:

inputFiles <- c("./sample1/outs/fragments.tsv.gz", "./sample2/outs/fragments.tsv.gz", "./sample3/outs/fragments.tsv.gz")

names(inputFiles) = c("sample1", "sample2", "sample3")

matrixFiles = gsub(inputFiles,pattern = "fragments.tsv.gz", replacement = "") matrixFiles

barcodes = list() peaks = GRangesList()

for(i in names(matrixFiles)){ barcodes[[i]] = as.data.frame(readr::read_tsv(file = paste0(matrixFiles[i],"filtered_peak_bc_matrix/barcodes.tsv")), col_names = F)[,1] p = as.data.frame(readr::read_tsv(file = paste0(matrixFiles[i],"filtered_peak_bc_matrix/peaks.bed")), col_names = F)[,1:3] colnames(p) = c("chr","start","end") peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc }

combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

hpages commented 1 year ago

Please provide a self-contained reproducible example. The code you share with us is not self-contained because we don't have access to all those files that you are using, so we can't run it. And if we can't run it we can't reproduce the problem. And if we can't reproduce the problem then we can't fix it. :man_shrugging:

Also you're still not providing your sessionInfo().

AntoniaChroni commented 1 year ago

is there an email that I could share this data with you?

AntoniaChroni commented 1 year ago

sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux

Matrix products: default BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] RColorBrewer_1.1-3 ggrepel_0.9.3
[3] harmony_0.1.1 Rcpp_1.0.10
[5] cowplot_1.1.1 clustree_0.5.0
[7] ggraph_2.1.0 patchwork_1.1.2
[9] lubridate_1.9.2 forcats_1.0.0
[11] stringr_1.5.0 purrr_1.0.1
[13] tidyr_1.3.0 tibble_3.1.8
[15] tidyverse_2.0.0 EnsDb.Hsapiens.v75_2.99.0
[17] JASPAR2020_0.99.10 TFBSTools_1.36.0
[19] dplyr_1.1.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5 [21] BSgenome_1.66.3 rtracklayer_1.58.0
[23] Biostrings_2.66.0 XVector_0.38.0
[25] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.22.0
[27] AnnotationFilter_1.22.0 GenomicFeatures_1.50.4
[29] AnnotationDbi_1.60.0 Biobase_2.58.0
[31] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[33] IRanges_2.32.0 S4Vectors_0.36.1
[35] BiocGenerics_0.44.0 readr_2.1.4
[37] Matrix_1.5-3 reshape2_1.4.4
[39] ggplot2_3.4.1 SeuratObject_4.1.3
[41] Seurat_4.3.0 Signac_1.9.0

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 scattermore_0.8
[3] R.methodsS3_1.8.2 knitr_1.42
[5] bit64_4.0.5 irlba_2.3.5.1
[7] DelayedArray_0.24.0 R.utils_2.12.2
[9] data.table_1.14.8 rpart_4.1.19
[11] KEGGREST_1.38.0 RCurl_1.98-1.10
[13] generics_0.1.3 RSQLite_2.3.0
[15] RANN_2.6.1 future_1.31.0
[17] bit_4.0.5 tzdb_0.3.0
[19] spatstat.data_3.0-0 xml2_1.3.3
[21] httpuv_1.6.9 SummarizedExperiment_1.28.0 [23] assertthat_0.2.1 DirichletMultinomial_1.40.0 [25] viridis_0.6.2 xfun_0.37
[27] hms_1.1.2 promises_1.2.0.1
[29] fansi_1.0.4 restfulr_0.0.15
[31] progress_1.2.2 caTools_1.18.2
[33] dbplyr_2.3.0 igraph_1.4.1
[35] DBI_1.1.3 htmlwidgets_1.6.1
[37] spatstat.geom_3.0-6 ellipsis_0.3.2
[39] backports_1.4.1 annotate_1.76.0
[41] biomaRt_2.54.0 deldir_1.0-6
[43] MatrixGenerics_1.10.0 vctrs_0.5.2
[45] ROCR_1.0-11 abind_1.4-5
[47] cachem_1.0.7 withr_2.5.0
[49] ggforce_0.4.1 progressr_0.13.0
[51] vroom_1.6.1 checkmate_2.1.0
[53] sctransform_0.3.5 GenomicAlignments_1.34.0
[55] prettyunits_1.1.1 goftest_1.2-3
[57] cluster_2.1.4 lazyeval_0.2.2
[59] seqLogo_1.64.0 crayon_1.5.2
[61] spatstat.explore_3.0-6 pkgconfig_2.0.3
[63] tweenr_2.0.2 nlme_3.1-162
[65] ProtGenerics_1.30.0 nnet_7.3-18
[67] rlang_1.0.6 globals_0.16.2
[69] lifecycle_1.0.3 miniUI_0.1.1.1
[71] filelock_1.0.2 BiocFileCache_2.6.1
[73] dichromat_2.0-0.1 polyclip_1.10-4
[75] matrixStats_0.63.0 lmtest_0.9-40
[77] zoo_1.8-11 base64enc_0.1-3
[79] ggridges_0.5.4 png_0.1-8
[81] viridisLite_0.4.1 rjson_0.2.21
[83] bitops_1.0-7 R.oo_1.25.0
[85] KernSmooth_2.23-20 blob_1.2.3
[87] parallelly_1.34.0 spatstat.random_3.1-3
[89] jpeg_0.1-10 CNEr_1.34.0
[91] scales_1.2.1 memoise_2.0.1
[93] magrittr_2.0.3 plyr_1.8.8
[95] ica_1.0-3 zlibbioc_1.44.0
[97] compiler_4.2.2 BiocIO_1.8.0
[99] fitdistrplus_1.1-8 Rsamtools_2.14.0
[101] cli_3.6.0 listenv_0.9.0
[103] pbapply_1.7-0 htmlTable_2.4.1
[105] Formula_1.2-5 MASS_7.3-58.2
[107] tidyselect_1.2.0 stringi_1.7.12
[109] yaml_2.3.7 latticeExtra_0.6-30
[111] grid_4.2.2 VariantAnnotation_1.44.1
[113] fastmatch_1.1-3 tools_4.2.2
[115] timechange_0.2.0 future.apply_1.10.0
[117] rstudioapi_0.14 TFMPvalue_0.0.9
[119] foreign_0.8-84 gridExtra_2.3
[121] farver_2.1.1 Rtsne_0.16
[123] digest_0.6.31 shiny_1.7.4
[125] pracma_2.4.2 later_1.3.0
[127] RcppAnnoy_0.0.20 httr_1.4.5
[129] biovizBase_1.46.0 colorspace_2.1-0
[131] XML_3.99-0.13 tensor_1.5
[133] reticulate_1.28 splines_4.2.2
[135] uwot_0.1.14 RcppRoll_0.3.0
[137] spatstat.utils_3.0-1 graphlayouts_0.8.4
[139] sp_1.6-0 plotly_4.10.1
[141] xtable_1.8-4 jsonlite_1.8.4
[143] poweRlaw_0.70.6 tidygraph_1.2.3
[145] R6_2.5.1 Hmisc_4.8-0
[147] pillar_1.8.1 htmltools_0.5.4
[149] mime_0.12 glue_1.6.2
[151] fastmap_1.1.1 BiocParallel_1.32.5
[153] codetools_0.2-19 utf8_1.2.3
[155] lattice_0.20-45 spatstat.sparse_3.0-0
[157] curl_5.0.0 leiden_0.4.3
[159] gtools_3.9.4 GO.db_3.16.0
[161] interp_1.1-3 survival_3.5-3
[163] munsell_0.5.0 GenomeInfoDbData_1.2.9
[165] gtable_0.3.1

hpages commented 1 year ago

I don't know what files exactly you want to send me by email but your code above uses 9 possibly big files. Indiscriminately sending me all that stuff by email and let me figure out what I really need to reproduce the problem is not the way to go. You need to figure out what's strictly needed i.e. it's on you to try to reduce the size of the requirements as much as possible.

This usually starts by trying to simplify your example as much as possible to isolate the code that reproduces the problem, and get rid of anything else. For example, we don't care about the 3 fragments.tsv.gz or 3 barcodes.tsv files, because they are not used at all to construct GRangesList object peaks. The only thing you need in order to construct this object are the peaks.bed files. So the first thing you want to do is simplify your reproducible example like this:

bedFiles <- c("./sample1/outs/filtered_peak_bc_matrix/peaks.bed",
              "./sample2/outs/filtered_peak_bc_matrix/peaks.bed",
              "./sample3/outs/filtered_peak_bc_matrix/peaks.bed")
names(bedFiles) = c("sample1", "sample2", "sample3")

peaks = GRangesList()

for(i in names(bedFiles)){
    p = as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)[,1:3]
    colnames(p) = c("chr","start","end")
    peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}

combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))

BTW you also want to make the effort to indent your code properly to make it easier for others to read and understand it.

Also now that the code is smaller, we can actually start to focus on it and pay attention to details. FWIW I notice that the col_names argument (which is an argument of the readr::read_tsv() function) is used in the call to as.data.frame() instead of the call to readr::read_tsv(). This is because of how the parentheses are placed:

as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)

when they should probably be:

as.data.frame(readr::read_tsv(file = bedFiles[i], col_names = F))

So before you share the peaks.bed data with us, can you run this code and confirm that it still reproduces the problem?Also can you share here what you get when you dispay peaks$sample1, peaks$sample2, and peaks$sample3? Thanks!

AntoniaChroni commented 1 year ago

So, I have done all these and still have the same error. The code was working fine before. BTW when I do: lapply(peaks,length) $sample1 [1] 98860

$sample2 [1] 76862

That returns a list of peaks per sample.

I am sorry for the confusion with the code and session info, but I was not aware of it.

hpages commented 1 year ago

What about sharing what you get when you display peaks$sample1, peaks$sample2, and peaks$sample3?

Also what do you think of the misplaced parentheses in as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F). You happy with that? It does what you want? Have you tried to re-run the code after fixing the parentheses? Does it make a difference?

AntoniaChroni commented 1 year ago

changing the parentheses is returning NULL for additional samples. If I keep the parentheses as they are:

peaks$sample1

GRanges object with 98860 ranges and 0 metadata columns:

      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1     629770-630550      *
  [2]     chr1     633794-634259      *
  [3]     chr1     778203-779262      *
  [4]     chr1     817083-817625      *
  [5]     chr1     821110-821501      *
  ...      ...               ...    ...

[98856]     chrY 56868863-56871397      *
[98857]     chrY 56872276-56872281      *

[98858]     chrY 56873716-56874522      *

[98859]   chrY 56877566-56877837      *

[98860]     chrY 56879680-56880015      *

-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths

peaks$sample2

GRanges object with 76862 ranges and 0 metadata columns:

      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]     chr1     629766-630143      *
  [2]     chr1     633792-634267      *
  [3]     chr1     778232-779166      *
  [4]     chr1     817123-817561      *
  [5]     chr1     820927-821538      *
  ...      ...               ...    ...

[76858]     chrY 56866489-56866827      *
[76859]     chrY 56868795-56872323      *
[76860]     chrY 56872875-56874168      *
[76861]     chrY 56877550-56877852      *
[76862]     chrY 56879622-56880373      *

-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
hpages commented 1 year ago

What about peaks$sample3? Why aren't you showing it? Note that it's the 3rd time I ask you to show it. You're making it really difficult to help you.

hpages commented 1 year ago

Also please show what you get when doing:

c(peaks$sample1, peaks$sample2, peaks$sample3)

BTW it wouldn't hurt that you take some time to learn the basics of how to properly format your posts here on GitHub e.g. any code or output produced by the code should be placed between triple backticks lines (```).

AntoniaChroni commented 1 year ago

I wanted to keep it simple, so I only included 2 samples when following your suggestions when using reduce function.

hpages commented 1 year ago

And you still get the error when doing reduce(c(peaks$sample1, peaks$sample2))? Can you please do that now and show what you get?

AntoniaChroni commented 1 year ago

reduce(c(peaks$sample1, peaks$sample2))

Error in getListElement(x, i, ...) :

 GRanges objects don't support [[, as.list(), lapply(), or unlist() at
 the moment
hpages commented 1 year ago

Formatting is still not good. I wrote "between triple backticks lines". This means that you must put a line with triple backticks before and after the code or the output of the code (so that's 2 lines with triple backticks). Also you can press Preview to see how things get formatted before you actually post. It's free so do not hesitate to use it.

Ok so now we've simplified your original reproducible example to this:

bedFiles <- c("./sample1/outs/filtered_peak_bc_matrix/peaks.bed",
              "./sample2/outs/filtered_peak_bc_matrix/peaks.bed")
names(bedFiles) = c("sample1", "sample2")

peaks = GRangesList()

for(i in names(bedFiles)){
    p = as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)[,1:3]
    colnames(p) = c("chr","start","end")
    peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}

combined.peaks <- reduce(c(peaks$sample1, peaks$sample2))

Again, can you show what you get when doing c(peaks$sample1, peaks$sample2), as previously asked already?

AntoniaChroni commented 1 year ago

hey, this hasn't been resolved yet. I woudl appreciate your help on this. So, this is what I am getting:

c(peaks$sample1, peaks$sample2)

GRanges object with 175722 ranges and 0 metadata columns:
           seqnames            ranges strand
              <Rle>         <IRanges>  <Rle>
       [1]     chr1     629770-630550      *
       [2]     chr1     633794-634259      *
       [3]     chr1     778203-779262      *
       [4]     chr1     817083-817625      *
       [5]     chr1     821110-821501      *
       ...      ...               ...    ...
  [175718]     chrY 56866489-56866827      *
  [175719]     chrY 56868795-56872323      *
  [175720]     chrY 56872875-56874168      *
  [175721]     chrY 56877550-56877852      *
  [175722]     chrY 56879622-56880373      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
hpages commented 1 year ago

At this point I will ask you to make your 2 bed files (sample1/outs/filtered_peak_bc_matrix/peaks.bed and sample2/outs/filtered_peak_bc_matrix/peaks.bed) available somewhere and to provide the links in this issue, so anybody can access and download them. You could try to reduce their size (e.g. by keeping only a few hundred lines), that would help.

However you absolutely want to make sure that you can still reproduce the issue using the files that you're going to share with us, and with BioC 3.16. The idea is that me and others can also reproduce the issue by just running the code that I posted in my last comment above using these files. So make sure you can do that too. Thanks!