Open AntoniaChroni opened 1 year ago
Hi @AntoniaChroni,
My script is: peaks = GRangesList() combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
Is this really your script? It doesn't make much sense because with your code peaks
is an empty GRangesList object with no sample1
, or sample2
, or sample3
list element. So it is expected to fail, but not with the same error as the one you got:
> peaks = GRangesList()
> combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘reduce’ for signature ‘"NULL"’
Please provide:
sessionInfo()
.Thanks, H.
This is the whole script:
inputFiles <- c("./sample1/outs/fragments.tsv.gz", "./sample2/outs/fragments.tsv.gz", "./sample3/outs/fragments.tsv.gz")
names(inputFiles) = c("sample1", "sample2", "sample3")
matrixFiles = gsub(inputFiles,pattern = "fragments.tsv.gz", replacement = "") matrixFiles
barcodes = list() peaks = GRangesList()
for(i in names(matrixFiles)){ barcodes[[i]] = as.data.frame(readr::read_tsv(file = paste0(matrixFiles[i],"filtered_peak_bc_matrix/barcodes.tsv")), col_names = F)[,1] p = as.data.frame(readr::read_tsv(file = paste0(matrixFiles[i],"filtered_peak_bc_matrix/peaks.bed")), col_names = F)[,1:3] colnames(p) = c("chr","start","end") peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc }
combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
Please provide a self-contained reproducible example. The code you share with us is not self-contained because we don't have access to all those files that you are using, so we can't run it. And if we can't run it we can't reproduce the problem. And if we can't reproduce the problem then we can't fix it. :man_shrugging:
Also you're still not providing your sessionInfo()
.
is there an email that I could share this data with you?
sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux
Matrix products: default BLAS: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRblas.so LAPACK: /gpfs/share/apps/R/4.2.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base
other attached packages:
[1] RColorBrewer_1.1-3 ggrepel_0.9.3
[3] harmony_0.1.1 Rcpp_1.0.10
[5] cowplot_1.1.1 clustree_0.5.0
[7] ggraph_2.1.0 patchwork_1.1.2
[9] lubridate_1.9.2 forcats_1.0.0
[11] stringr_1.5.0 purrr_1.0.1
[13] tidyr_1.3.0 tibble_3.1.8
[15] tidyverse_2.0.0 EnsDb.Hsapiens.v75_2.99.0
[17] JASPAR2020_0.99.10 TFBSTools_1.36.0
[19] dplyr_1.1.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[21] BSgenome_1.66.3 rtracklayer_1.58.0
[23] Biostrings_2.66.0 XVector_0.38.0
[25] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.22.0
[27] AnnotationFilter_1.22.0 GenomicFeatures_1.50.4
[29] AnnotationDbi_1.60.0 Biobase_2.58.0
[31] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[33] IRanges_2.32.0 S4Vectors_0.36.1
[35] BiocGenerics_0.44.0 readr_2.1.4
[37] Matrix_1.5-3 reshape2_1.4.4
[39] ggplot2_3.4.1 SeuratObject_4.1.3
[41] Seurat_4.3.0 Signac_1.9.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 scattermore_0.8
[3] R.methodsS3_1.8.2 knitr_1.42
[5] bit64_4.0.5 irlba_2.3.5.1
[7] DelayedArray_0.24.0 R.utils_2.12.2
[9] data.table_1.14.8 rpart_4.1.19
[11] KEGGREST_1.38.0 RCurl_1.98-1.10
[13] generics_0.1.3 RSQLite_2.3.0
[15] RANN_2.6.1 future_1.31.0
[17] bit_4.0.5 tzdb_0.3.0
[19] spatstat.data_3.0-0 xml2_1.3.3
[21] httpuv_1.6.9 SummarizedExperiment_1.28.0
[23] assertthat_0.2.1 DirichletMultinomial_1.40.0
[25] viridis_0.6.2 xfun_0.37
[27] hms_1.1.2 promises_1.2.0.1
[29] fansi_1.0.4 restfulr_0.0.15
[31] progress_1.2.2 caTools_1.18.2
[33] dbplyr_2.3.0 igraph_1.4.1
[35] DBI_1.1.3 htmlwidgets_1.6.1
[37] spatstat.geom_3.0-6 ellipsis_0.3.2
[39] backports_1.4.1 annotate_1.76.0
[41] biomaRt_2.54.0 deldir_1.0-6
[43] MatrixGenerics_1.10.0 vctrs_0.5.2
[45] ROCR_1.0-11 abind_1.4-5
[47] cachem_1.0.7 withr_2.5.0
[49] ggforce_0.4.1 progressr_0.13.0
[51] vroom_1.6.1 checkmate_2.1.0
[53] sctransform_0.3.5 GenomicAlignments_1.34.0
[55] prettyunits_1.1.1 goftest_1.2-3
[57] cluster_2.1.4 lazyeval_0.2.2
[59] seqLogo_1.64.0 crayon_1.5.2
[61] spatstat.explore_3.0-6 pkgconfig_2.0.3
[63] tweenr_2.0.2 nlme_3.1-162
[65] ProtGenerics_1.30.0 nnet_7.3-18
[67] rlang_1.0.6 globals_0.16.2
[69] lifecycle_1.0.3 miniUI_0.1.1.1
[71] filelock_1.0.2 BiocFileCache_2.6.1
[73] dichromat_2.0-0.1 polyclip_1.10-4
[75] matrixStats_0.63.0 lmtest_0.9-40
[77] zoo_1.8-11 base64enc_0.1-3
[79] ggridges_0.5.4 png_0.1-8
[81] viridisLite_0.4.1 rjson_0.2.21
[83] bitops_1.0-7 R.oo_1.25.0
[85] KernSmooth_2.23-20 blob_1.2.3
[87] parallelly_1.34.0 spatstat.random_3.1-3
[89] jpeg_0.1-10 CNEr_1.34.0
[91] scales_1.2.1 memoise_2.0.1
[93] magrittr_2.0.3 plyr_1.8.8
[95] ica_1.0-3 zlibbioc_1.44.0
[97] compiler_4.2.2 BiocIO_1.8.0
[99] fitdistrplus_1.1-8 Rsamtools_2.14.0
[101] cli_3.6.0 listenv_0.9.0
[103] pbapply_1.7-0 htmlTable_2.4.1
[105] Formula_1.2-5 MASS_7.3-58.2
[107] tidyselect_1.2.0 stringi_1.7.12
[109] yaml_2.3.7 latticeExtra_0.6-30
[111] grid_4.2.2 VariantAnnotation_1.44.1
[113] fastmatch_1.1-3 tools_4.2.2
[115] timechange_0.2.0 future.apply_1.10.0
[117] rstudioapi_0.14 TFMPvalue_0.0.9
[119] foreign_0.8-84 gridExtra_2.3
[121] farver_2.1.1 Rtsne_0.16
[123] digest_0.6.31 shiny_1.7.4
[125] pracma_2.4.2 later_1.3.0
[127] RcppAnnoy_0.0.20 httr_1.4.5
[129] biovizBase_1.46.0 colorspace_2.1-0
[131] XML_3.99-0.13 tensor_1.5
[133] reticulate_1.28 splines_4.2.2
[135] uwot_0.1.14 RcppRoll_0.3.0
[137] spatstat.utils_3.0-1 graphlayouts_0.8.4
[139] sp_1.6-0 plotly_4.10.1
[141] xtable_1.8-4 jsonlite_1.8.4
[143] poweRlaw_0.70.6 tidygraph_1.2.3
[145] R6_2.5.1 Hmisc_4.8-0
[147] pillar_1.8.1 htmltools_0.5.4
[149] mime_0.12 glue_1.6.2
[151] fastmap_1.1.1 BiocParallel_1.32.5
[153] codetools_0.2-19 utf8_1.2.3
[155] lattice_0.20-45 spatstat.sparse_3.0-0
[157] curl_5.0.0 leiden_0.4.3
[159] gtools_3.9.4 GO.db_3.16.0
[161] interp_1.1-3 survival_3.5-3
[163] munsell_0.5.0 GenomeInfoDbData_1.2.9
[165] gtable_0.3.1
I don't know what files exactly you want to send me by email but your code above uses 9 possibly big files. Indiscriminately sending me all that stuff by email and let me figure out what I really need to reproduce the problem is not the way to go. You need to figure out what's strictly needed i.e. it's on you to try to reduce the size of the requirements as much as possible.
This usually starts by trying to simplify your example as much as possible to isolate the code that reproduces the problem, and get rid of anything else. For example, we don't care about the 3 fragments.tsv.gz or 3 barcodes.tsv files, because they are not used at all to construct GRangesList object peaks
. The only thing you need in order to construct this object are the peaks.bed files. So the first thing you want to do is simplify your reproducible example like this:
bedFiles <- c("./sample1/outs/filtered_peak_bc_matrix/peaks.bed",
"./sample2/outs/filtered_peak_bc_matrix/peaks.bed",
"./sample3/outs/filtered_peak_bc_matrix/peaks.bed")
names(bedFiles) = c("sample1", "sample2", "sample3")
peaks = GRangesList()
for(i in names(bedFiles)){
p = as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)[,1:3]
colnames(p) = c("chr","start","end")
peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}
combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
BTW you also want to make the effort to indent your code properly to make it easier for others to read and understand it.
Also now that the code is smaller, we can actually start to focus on it and pay attention to details. FWIW I notice that the col_names
argument (which is an argument of the readr::read_tsv()
function) is used in the call to as.data.frame()
instead of the call to readr::read_tsv()
. This is because of how the parentheses are placed:
as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)
when they should probably be:
as.data.frame(readr::read_tsv(file = bedFiles[i], col_names = F))
So before you share the peaks.bed data with us, can you run this code and confirm that it still reproduces the problem?Also can you share here what you get when you dispay peaks$sample1
, peaks$sample2
, and peaks$sample3
? Thanks!
So, I have done all these and still have the same error. The code was working fine before.
BTW when I do:
lapply(peaks,length)
$sample1
[1] 98860
$sample2 [1] 76862
That returns a list of peaks per sample.
I am sorry for the confusion with the code and session info, but I was not aware of it.
What about sharing what you get when you display peaks$sample1
, peaks$sample2
, and peaks$sample3
?
Also what do you think of the misplaced parentheses in as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)
. You happy with that? It does what you want? Have you tried to re-run the code after fixing the parentheses? Does it make a difference?
changing the parentheses is returning NULL for additional samples. If I keep the parentheses as they are:
peaks$sample1
GRanges object with 98860 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 629770-630550 *
[2] chr1 633794-634259 *
[3] chr1 778203-779262 *
[4] chr1 817083-817625 *
[5] chr1 821110-821501 *
... ... ... ...
[98856] chrY 56868863-56871397 *
[98857] chrY 56872276-56872281 *
[98858] chrY 56873716-56874522 *
[98859] chrY 56877566-56877837 *
[98860] chrY 56879680-56880015 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
peaks$sample2
GRanges object with 76862 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 629766-630143 *
[2] chr1 633792-634267 *
[3] chr1 778232-779166 *
[4] chr1 817123-817561 *
[5] chr1 820927-821538 *
... ... ... ...
[76858] chrY 56866489-56866827 *
[76859] chrY 56868795-56872323 *
[76860] chrY 56872875-56874168 *
[76861] chrY 56877550-56877852 *
[76862] chrY 56879622-56880373 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
What about peaks$sample3
? Why aren't you showing it? Note that it's the 3rd time I ask you to show it. You're making it really difficult to help you.
Also please show what you get when doing:
c(peaks$sample1, peaks$sample2, peaks$sample3)
BTW it wouldn't hurt that you take some time to learn the basics of how to properly format your posts here on GitHub e.g. any code or output produced by the code should be placed between triple backticks lines (```
).
I wanted to keep it simple, so I only included 2 samples when following your suggestions when using reduce function.
And you still get the error when doing reduce(c(peaks$sample1, peaks$sample2))
? Can you please do that now and show what you get?
reduce(c(peaks$sample1, peaks$sample2))
Error in getListElement(x, i, ...) :
GRanges objects don't support [[, as.list(), lapply(), or unlist() at
the moment
Formatting is still not good. I wrote "between triple backticks lines". This means that you must put a line with triple backticks before and after the code or the output of the code (so that's 2 lines with triple backticks). Also you can press Preview to see how things get formatted before you actually post. It's free so do not hesitate to use it.
Ok so now we've simplified your original reproducible example to this:
bedFiles <- c("./sample1/outs/filtered_peak_bc_matrix/peaks.bed",
"./sample2/outs/filtered_peak_bc_matrix/peaks.bed")
names(bedFiles) = c("sample1", "sample2")
peaks = GRangesList()
for(i in names(bedFiles)){
p = as.data.frame(readr::read_tsv(file = bedFiles[i]), col_names = F)[,1:3]
colnames(p) = c("chr","start","end")
peaks[[i]] = makeGRangesFromDataFrame(df = p) # keep.extra.columns = T - this adds colmns related to annotation, percentGC, percentAT etc
}
combined.peaks <- reduce(c(peaks$sample1, peaks$sample2))
Again, can you show what you get when doing c(peaks$sample1, peaks$sample2)
, as previously asked already?
hey, this hasn't been resolved yet. I woudl appreciate your help on this. So, this is what I am getting:
c(peaks$sample1, peaks$sample2)
GRanges object with 175722 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 629770-630550 *
[2] chr1 633794-634259 *
[3] chr1 778203-779262 *
[4] chr1 817083-817625 *
[5] chr1 821110-821501 *
... ... ... ...
[175718] chrY 56866489-56866827 *
[175719] chrY 56868795-56872323 *
[175720] chrY 56872875-56874168 *
[175721] chrY 56877550-56877852 *
[175722] chrY 56879622-56880373 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
At this point I will ask you to make your 2 bed files (sample1/outs/filtered_peak_bc_matrix/peaks.bed
and sample2/outs/filtered_peak_bc_matrix/peaks.bed
) available somewhere and to provide the links in this issue, so anybody can access and download them. You could try to reduce their size (e.g. by keeping only a few hundred lines), that would help.
However you absolutely want to make sure that you can still reproduce the issue using the files that you're going to share with us, and with BioC 3.16. The idea is that me and others can also reproduce the issue by just running the code that I posted in my last comment above using these files. So make sure you can do that too. Thanks!
Hey,
I am getting this error when I am using reduce function on a GRangesList (it was working fine a few days ago).
Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment
My script is: peaks = GRangesList() combined.peaks <- reduce(c(peaks$sample1, peaks$sample2, peaks$sample3))
And the output looks like this:
...
<10 more elements> Any thoughts on what is happening? Thanks!