charles-plessy / CAGEr

Mirror of Bioconductor's CAGEr package repository
https://bioconductor.org/packages/CAGEr
6 stars 4 forks source link

getCTSS() failed for inputFilesType = "bamPairedEnd" in CAGEr 2.0.1 (not CAGEr 1.34.0) #49

Closed Pentayouth closed 2 years ago

Pentayouth commented 2 years ago

I wanted to getCTSS directly from PE bam file. I did:

ce_nepc <- CAGEexp(genomeName = "BSgenome.Hsapiens.UCSC.hg38",
                   inputFiles = inputFiles,
                   inputFilesType = "bamPairedEnd",
                   sampleLabels = c("cjc","xxz")); ce_nepc

getCTSS(ce_nepc, correctSystematicG = F)

It works well with CAGEr 1.34.0, but I want to use exportToTrack(), so I updated to CAGEr 2.0.1, however, the same code gives an Error this time:

Reading in file: /public/home/lijing/wangzw/te210720/cage_study/staralign/CJC-2_S1/rrna_rm.markdup.bam...
    -> Filtering out low quality reads...
    -> Removing the first base of the reads if 'G' and not aligned to the genome...
Error: BiocParallel errors
  1 remote errors, element index: 1
  1 unevaluated and other errors
  first remote error: unable to find an inherited method for function 'update_ranges' for signature '"UnstitchedIPos"'

I googled for solution and just found one relevant post here: https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015524.html which saied:

the methods package automatically creates a coercion method from CTSS to GRanges for you. Unfortunately this method is broken.

I have no idea, but something must be wrong.

My session info is:

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS/LAPACK: /public/home/lijing/miniconda3/envs/zwcage/lib/libopenblasp-r0.3.18.so

locale:
[1] C

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] CAGEr_2.0.1                 MultiAssayExperiment_1.20.0
 [3] SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [5] GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
 [7] IRanges_2.28.0              S4Vectors_0.32.0           
 [9] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[11] matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 stringdist_0.9.8                 
 [3] Rcpp_1.0.7                        lattice_0.20-45                  
 [5] formula.tools_1.7.1               Rsamtools_2.10.0                 
 [7] Biostrings_2.62.0                 gtools_3.9.2                     
 [9] utf8_1.2.2                        R6_2.5.1                         
[11] plyr_1.8.6                        ggplot2_3.3.5                    
[13] pillar_1.6.3                      sparseMatrixStats_1.6.0          
[15] zlibbioc_1.40.0                   rlang_0.4.11                     
[17] rstudioapi_0.13                   data.table_1.14.2                
[19] vegan_2.5-7                       Matrix_1.3-4                     
[21] splines_4.1.1                     BiocParallel_1.28.0              
[23] stringr_1.4.0                     RCurl_1.98-1.5                   
[25] munsell_0.5.0                     DelayedArray_0.20.0              
[27] compiler_4.1.1                    rtracklayer_1.54.0               
[29] pkgconfig_2.0.3                   mgcv_1.8-38                      
[31] tidyselect_1.1.1                  tibble_3.1.5                     
[33] GenomeInfoDbData_1.2.6            XML_3.99-0.8                     
[35] permute_0.9-5                     fansi_0.4.2                      
[37] crayon_1.4.1                      dplyr_1.0.7                      
[39] MASS_7.3-54                       GenomicAlignments_1.30.0         
[41] bitops_1.0-7                      grid_4.1.1                       
[43] nlme_3.1-153                      gtable_0.3.0                     
[45] lifecycle_1.0.1                   magrittr_2.0.1                   
[47] scales_1.1.1                      KernSmooth_2.23-20               
[49] som_0.3-5.1                       stringi_1.7.5                    
[51] cachem_1.0.6                      reshape2_1.4.4                   
[53] XVector_0.34.0                    DelayedMatrixStats_1.16.0        
[55] ellipsis_0.3.2                    vctrs_0.3.8                      
[57] generics_0.1.0                    rjson_0.2.20                     
[59] restfulr_0.0.13                   tools_4.1.1                      
[61] BSgenome_1.62.0                   glue_1.4.2                       
[63] purrr_0.3.4                       parallel_4.1.1                   
[65] fastmap_1.1.0                     yaml_2.2.1                       
[67] colorspace_2.0-2                  cluster_2.1.2                    
[69] operator.tools_1.6.3              VGAM_1.1-5                       
[71] memoise_2.0.0                     BiocIO_1.4.0 
charles-plessy commented 2 years ago

Thanks for the report, can you share small-size bam files that reproduce the error? If subsetting in a small region is not enough, try subsetting two regions from two different chromosomes, making sure that there is signal on both strands. Hopefully that would be enough.

Pentayouth commented 2 years ago

Thank you for your kind reply. I'm happy to share my bam file but I don't know how. Can you download from the link below? The file size is ~8M. https://pan.baidu.com/s/1BqclqGIZGZ-2gXykgSUAFA The share password is 451f

Pentayouth commented 2 years ago

BTW, the code that generated the small bam is: samtools view -Sb rrna_rm.markdup.bam chr21 chr22 > *.small.bam

Pentayouth commented 2 years ago

I really appreciate the chance to have a direct contact with you the developer. So I would like to ask 2 more questions and hope for your clarification.

  1. After I failed to input bam, I tried to input data in another way like inputing ctss file. But from the vignette I can't find any clue about how to generate such file. I understand the definition of CTSS, but I wonder if *.ctss format is formaly defined.
  2. When getCTSS(), I can't specify correctSystematicG = TRUE, but I got an error which said correctSystematicG = TRUE currently doesnt support CAGEexp object. It looks wired to me, could you please explain it? Thank you in advance for any reply or suggestion. Wang
charles-plessy commented 2 years ago

Can you download from the link below?

Sorry, but I have an error message that Google Translate says it means the link is expired. Since the file is only 8M, can you e-mail it to me? My maintainer address is charles.plessy at oist.jp.

Pentayouth commented 2 years ago

Sure, I've send it to you. Please let me know if everything works fine.

charles-plessy commented 2 years ago
1. After I failed to input bam, I tried to input data in another way like inputing ctss file. But from the vignette I can't find any clue about how to generate such file. I understand the definition of CTSS, but I wonder if *.ctss format is formaly defined.

CTSS is a format that was used in an earlier FANTOM project; I would not recommend it today. Can you try converting to BED and using the bed format ?

2. When getCTSS(), I can't specify `correctSystematicG = TRUE`, but I got an error which said `correctSystematicG = TRUE` currently doesnt support CAGEexp object. It looks wired to me, could you please explain it?

In BAM files where 5-prime mismatches are not soft-clipped (which was the standard when CAGEr was created), the alignment can start on a mismatched G that was added by the reverse-transcriptase, and the removeFirstG option corrects that. Very often, it is enough. But what if a G was added and matches by chance a G in the genome ? The correctSystematicG is an attempt to solve that question. It is a more complex algorithm and when I replaced the CAGEset class by the more efficient CAGEexp class, I did not find time to port it.

charles-plessy commented 2 years ago

On my computer it works, but I only have time to test on a container with R-dev and Bioc devel version. Can you double-check that the files you sent me trigger the error on your side ?

> inputFiles <- c("cjc.small.bam",  "xxz.small.bam")
> 
> ce_nepc <- CAGEexp(genomeName = "BSgenome.Hsapiens.UCSC.hg38",
+                    inputFiles = inputFiles,
+                    inputFilesType = "bamPairedEnd",
+                    sampleLabels = c("cjc","xxz"))
> ce_nepc
A CAGEexp object of 0 listed
 experiments with no user-defined names and respective classes.
 Containing an ExperimentList class object of length 0:
 Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files
> 
> ce_nepc <- getCTSS(ce_nepc, correctSystematicG = FALSE)

Reading in file: cjc.small.bam...
    -> Filtering out low quality reads...
    -> Removing the first base of the reads if 'G' and not aligned to the genome...

Reading in file: xxz.small.bam...
    -> Filtering out low quality reads...
    -> Removing the first base of the reads if 'G' and not aligned to the genome...
> 
> colData(ce_nepc)
DataFrame with 2 rows and 4 columns
       inputFiles inputFilesType sampleLabels librarySizes
      <character>    <character>  <character>    <integer>
cjc cjc.small.bam   bamPairedEnd          cjc        18887
xxz xxz.small.bam   bamPairedEnd          xxz        92637
> 
> CTSScoordinatesGR(ce_nepc)
CTSS object with 11141 positions and 0 metadata columns:
          seqnames       pos strand
             <Rle> <integer>  <Rle>
      [1]    chr21   5096795      +
      [2]    chr21   5101807      +
      [3]    chr21   5118090      +
      [4]    chr21   5123229      +
      [5]    chr21   6564651      +
      ...      ...       ...    ...
  [11137]    chr22  50783636      -
  [11138]    chr22  50783641      -
  [11139]    chr22  50783642      -
  [11140]    chr22  50783646      -
  [11141]    chr22  50783652      -
  -------
  seqinfo: 640 sequences (1 circular) from hg38 genome
  BSgenome name: BSgenome.Hsapiens.UCSC.hg38 
> 
> sessionInfo()
R Under development (unstable) (2021-10-27 r81107)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bookworm/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CAGEr_2.1.0                 MultiAssayExperiment_1.21.0 SummarizedExperiment_1.25.0 Biobase_2.55.0              GenomicRanges_1.46.0        GenomeInfoDb_1.31.0        
 [7] IRanges_2.29.0              S4Vectors_0.33.0            BiocGenerics_0.41.0         MatrixGenerics_1.7.0        matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] VGAM_1.1-5                        splines_4.2.0                     DelayedMatrixStats_1.17.0         gtools_3.9.2                      assertthat_0.2.1                 
 [6] BiocManager_1.30.16               BSgenome_1.63.0                   GenomeInfoDbData_1.2.7            Rsamtools_2.11.0                  yaml_2.2.1                       
[11] pillar_1.6.4                      lattice_0.20-45                   glue_1.4.2                        XVector_0.35.0                    colorspace_2.0-2                 
[16] Matrix_1.3-4                      plyr_1.8.6                        XML_3.99-0.8                      pkgconfig_2.0.3                   zlibbioc_1.41.0                  
[21] purrr_0.3.4                       scales_1.1.1                      stringdist_0.9.8                  BiocParallel_1.29.0               tibble_3.1.5                     
[26] mgcv_1.8-38                       generics_0.1.1                    ggplot2_3.3.5                     ellipsis_0.3.2                    cachem_1.0.6                     
[31] formula.tools_1.7.1               magrittr_2.0.1                    crayon_1.4.1                      memoise_2.0.0                     fansi_0.5.0                      
[36] nlme_3.1-153                      operator.tools_1.6.3              MASS_7.3-54                       vegan_2.5-7                       tools_4.2.0                      
[41] data.table_1.14.2                 BiocIO_1.5.0                      lifecycle_1.0.1                   stringr_1.4.0                     munsell_0.5.0                    
[46] cluster_2.1.2                     DelayedArray_0.21.0               Biostrings_2.63.0                 som_0.3-5.1                       compiler_4.2.0                   
[51] rlang_0.4.12                      grid_4.2.0                        RCurl_1.98-1.5                    rstudioapi_0.13                   rjson_0.2.20                     
[56] bitops_1.0-7                      restfulr_0.0.13                   gtable_0.3.0                      DBI_1.1.1                         reshape2_1.4.4                   
[61] R6_2.5.1                          GenomicAlignments_1.31.0          dplyr_1.0.7                       rtracklayer_1.55.0                fastmap_1.1.0                    
[66] utf8_1.2.2                        KernSmooth_2.23-20                permute_0.9-5                     stringi_1.7.5                     parallel_4.2.0                   
[71] Rcpp_1.0.7                        vctrs_0.3.8                       BSgenome.Hsapiens.UCSC.hg38_1.4.4 tidyselect_1.1.1                  sparseMatrixStats_1.7.0     
Pentayouth commented 2 years ago

Maybe there is an issue with my conda environment or somthing... I feel sorry to have bothered you,

Anyway, thank you so much for your time. You are so nice. Have a nice day ;-)

charles-plessy commented 2 years ago

No worries !

I am going to close this issue, but feel free to reopen it if you can reproduce it on a fresh package.

Pentayouth commented 2 years ago

Hello Plessy, I don't want to bother you but as I switched from the lab Linux server to my personal windows PC, the same error still exists... Here's the code and the sessioninfo.

This time I used CAGEr 2.1.0 instead of 2.0.1, so I installed the github version by devtools::install_local()

rm(list = ls())
# install.packages("pacman")
# install.packages("installr")
library(pacman)
# require(installr)
# updateR()
p_unload(CAGEr)
devtools::install_local("./CAGEr-master.zip") # try to install github version
library(CAGEr)
library(rtracklayer)
library(dplyr)

# BiocManager::install("CAGEr", force = T)
# chooseBioCmirror()
# BiocManager::install("BSgenome.Hsapiens.UCSC.hg38")

inputFiles = paste0(getwd(),"/",c("CJC","XXZ"),"/rrna_rm.markdup.bam"); file.exists(inputFiles)
ce_nepc <- CAGEexp(genomeName = "BSgenome.Hsapiens.UCSC.hg38",
                   inputFiles = inputFiles,
                   inputFilesType = "bamPairedEnd",
                   sampleLabels = c("cjc","xxz")); ce_nepc

getCTSS(ce_nepc, correctSystematicG = F)

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CAGEr_2.1.1                 pacman_0.5.1                MultiAssayExperiment_1.20.0
 [4] SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.0       
 [7] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.2           
[10] BiocGenerics_0.40.0         MatrixGenerics_1.6.0        matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] nlme_3.1-153              bitops_1.0-7              fs_1.5.0                  usethis_2.1.3            
 [5] devtools_2.4.2            rprojroot_2.0.2           tools_4.1.2               utf8_1.2.2               
 [9] R6_2.5.1                  vegan_2.5-7               KernSmooth_2.23-20        mgcv_1.8-38              
[13] colorspace_2.0-2          permute_0.9-5             withr_2.4.2               tidyselect_1.1.1         
[17] prettyunits_1.1.1         processx_3.5.2            compiler_4.1.2            cli_3.1.0                
[21] desc_1.4.0                DelayedArray_0.20.0       rtracklayer_1.54.0        scales_1.1.1             
[25] callr_3.7.0               stringr_1.4.0             Rsamtools_2.10.0          stringdist_0.9.8         
[29] XVector_0.34.0            pkgconfig_2.0.3           sessioninfo_1.2.1         sparseMatrixStats_1.6.0  
[33] fastmap_1.1.0             BSgenome_1.62.0           rlang_0.4.12              rstudioapi_0.13          
[37] VGAM_1.1-5                DelayedMatrixStats_1.16.0 BiocIO_1.4.0              generics_0.1.1           
[41] BiocParallel_1.28.0       gtools_3.9.2              dplyr_1.0.7               RCurl_1.98-1.5           
[45] magrittr_2.0.1            GenomeInfoDbData_1.2.7    Matrix_1.3-4              Rcpp_1.0.7               
[49] munsell_0.5.0             fansi_0.5.0               lifecycle_1.0.1           stringi_1.7.5            
[53] yaml_2.2.1                MASS_7.3-54               zlibbioc_1.40.0           pkgbuild_1.2.0           
[57] plyr_1.8.6                grid_4.1.2                formula.tools_1.7.1       parallel_4.1.2           
[61] crayon_1.4.2              lattice_0.20-45           Biostrings_2.62.0         splines_4.1.2            
[65] ps_1.6.0                  pillar_1.6.4              rjson_0.2.20              reshape2_1.4.4           
[69] pkgload_1.2.3             XML_3.99-0.8              glue_1.4.2                data.table_1.14.2        
[73] remotes_2.4.1             operator.tools_1.6.3      vctrs_0.3.8               testthat_3.1.0           
[77] gtable_0.3.0              purrr_0.3.4               cachem_1.0.6              ggplot2_3.3.5            
[81] restfulr_0.0.13           tibble_3.1.5              som_0.3-5.1               GenomicAlignments_1.30.0 
[85] memoise_2.0.0             cluster_2.1.2             ellipsis_0.3.2  

I've sent you the bam files I used that contain reads from all chromosomes by e-mail in case you are interested.

Best regards. Wang

charles-plessy commented 2 years ago

Thanks for your investigation! I could reproduce the bug with 2.0.1 in a different container.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bookworm/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CAGEr_2.0.1                 MultiAssayExperiment_1.20.0 SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
 [7] IRanges_2.28.0              S4Vectors_0.32.2            BiocGenerics_0.40.0         MatrixGenerics_1.6.0        matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] VGAM_1.1-5                        splines_4.1.1                     DelayedMatrixStats_1.16.0         gtools_3.9.2                      assertthat_0.2.1                 
 [6] BiocManager_1.30.16               BSgenome_1.62.0                   GenomeInfoDbData_1.2.7            Rsamtools_2.10.0                  yaml_2.2.1                       
[11] pillar_1.6.4                      lattice_0.20-45                   glue_1.5.0                        XVector_0.34.0                    colorspace_2.0-2                 
[16] Matrix_1.3-4                      plyr_1.8.6                        XML_3.99-0.8                      pkgconfig_2.0.3                   zlibbioc_1.40.0                  
[21] purrr_0.3.4                       scales_1.1.1                      stringdist_0.9.8                  BiocParallel_1.28.0               tibble_3.1.6                     
[26] mgcv_1.8-38                       generics_0.1.1                    ggplot2_3.3.5                     ellipsis_0.3.2                    cachem_1.0.6                     
[31] formula.tools_1.7.1               magrittr_2.0.1                    crayon_1.4.2                      memoise_2.0.0                     fansi_0.5.0                      
[36] operator.tools_1.6.3              nlme_3.1-153                      MASS_7.3-54                       vegan_2.5-7                       tools_4.1.1                      
[41] data.table_1.14.2                 BiocIO_1.4.0                      lifecycle_1.0.1                   stringr_1.4.0                     munsell_0.5.0                    
[46] cluster_2.1.2                     DelayedArray_0.20.0               Biostrings_2.62.0                 som_0.3-5.1                       compiler_4.1.1                   
[51] rlang_0.4.12                      grid_4.1.1                        RCurl_1.98-1.5                    rstudioapi_0.13                   rjson_0.2.20                     
[56] bitops_1.0-7                      restfulr_0.0.13                   gtable_0.3.0                      DBI_1.1.1                         reshape2_1.4.4                   
[61] R6_2.5.1                          GenomicAlignments_1.30.0          dplyr_1.0.7                       rtracklayer_1.54.0                fastmap_1.1.0                    
[66] utf8_1.2.2                        KernSmooth_2.23-20                permute_0.9-5                     stringi_1.7.5                     parallel_4.1.1                   
[71] Rcpp_1.0.7                        vctrs_0.3.8                       BSgenome.Hsapiens.UCSC.hg38_1.4.4 tidyselect_1.1.1                  sparseMatrixStats_1.6.0
charles-plessy commented 2 years ago

I think that I isolated the problem. I do not understand why the error did not trigger earlier on my machine. Basically, the promoters function fails on GPos objects, and the CTSS class is wrapping the GPos class...

> promoters(CTSScoordinatesGR(exampleCAGEexp))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘update_ranges’ for signature ‘"UnstitchedIPos"’
> promoters(GRanges(CTSScoordinatesGR(exampleCAGEexp)))
GRanges object with 5000 ranges and 5 metadata columns:
         seqnames            ranges strand |  genes annotation exprClass filteredCTSSidx                cluster
            <Rle>         <IRanges>  <Rle> |  <Rle>      <Rle>     <Rle>           <Rle>                  <Rle>
     [1]    chr17 26025430-26027629      + |           unknown       0_0            TRUE       chr17:26027430:+
     [2]    chr17 26048540-26050739      + | grid1a   promoter       4_4            TRUE       chr17:26050540:+
     [3]    chr17 26116088-26118287      + | grid1a       exon       0_0            TRUE       chr17:26118088:+
     [4]    chr17 26140853-26143052      + | grid1a     intron       0_4            TRUE       chr17:26142853:+
     [5]    chr17 26164954-26167153      + | grid1a       exon       0_0            TRUE       chr17:26166954:+
     ...      ...               ...    ... .    ...        ...       ...             ...                    ...
  [4996]    chr17 32706950-32709149      + | ywhaqb       exon       0_2            TRUE chr17:32708847-32708..
  [4997]    chr17 32706953-32709152      + | ywhaqb       exon       0_2            TRUE chr17:32708847-32708..
  [4998]    chr17 32706955-32709154      + | ywhaqb       exon       0_2            TRUE chr17:32708847-32708..
  [4999]    chr17 32706957-32709156      + | ywhaqb       exon       0_4            TRUE chr17:32708847-32708..
  [5000]    chr17 32706958-32709157      + | ywhaqb       exon       0_2            TRUE chr17:32708847-32708..
  -------
  seqinfo: 26 sequences (1 circular) from danRer7 genome

For the record, I am asking upstream if this failure is expected: https://support.bioconductor.org/p/9140695/

Pentayouth commented 2 years ago

Hello, Plessy Thank you for your reply. I'm very glad that I helped. There are promotors() function in SummarizedExperiment GenomicFeatures GenomicRanges IRanges So I wonder which one you actually specified here. Also why does the error only occurs in certain circumstance.

Best Regards, Wang

charles-plessy commented 2 years ago

I think that I fixed the issue in the master branch. Can you install it and check you can load your data? If yes I will port the fix to the stable version, which will be 2.0.2.

Pentayouth commented 2 years ago

Hello Plessy,

Sorry for the late reply.

I have tested the latest branch, everything works fine!

Thank you for all your kindness and patience.

Best

Wang


> devtools::install_local("./CAGEr-master.zip")
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All                           
2: CRAN packages only            
3: None                          
4: glue   (1.4.2 -> 1.5.0) [CRAN]
5: tibble (3.1.5 -> 3.1.6) [CRAN]

Enter one or more numbers, or an empty line to skip updates: library(CAGEr)
Enter one or more numbers, or an empty line to skip updates: 3
* installing *source* package 'CAGEr' ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package 'CAGEr'
    finding HTML links ... done
    CAGEexp-class                           html  
    finding level-2 HTML links ... done

    CAGEr-class                             html  
    CAGEr-package                           html  
    CAGEr_Multicore                         html  
    CTSS-class                              html  
    CTSSclusteringMethod                    html  
    CTSScoordinates                         html  
    CTSScumulativesTagClusters              html  
    CTSSnormalizedTpm                       html  
    CTSStagCount                            html  
    CTSStoGenes                             html  
    ConsensusClusters-class                 html  
    CustomConsensusClusters                 html  
    FANTOM5humanSamples                     html  
    FANTOM5mouseSamples                     html  
    GeneExpDESeq2                           html  
    GeneExpSE                               html  
    QuantileWidthFunctions                  html  
    TagClusters-class                       html  
    aggregateTagClusters                    html  
    annotateCTSS                            html  
    bam2CTSS                                html  
    byCtss                                  html  
    clusterAggregateAndSum                  html  
    clusterCTSS                             html  
    coerceInBSgenome                        html  
    consensusClusterConvertors              html  
    consensusClusters-set                   html  
    consensusClusters                       html  
    consensusClustersDESeq2                 html  
    consensusClustersQuantile               html  
    consensusClustersTpm                    html  
    coverage-functions                      html  
    cumulativeCTSSdistribution              html  
    distclu-functions                       html  
    exampleCAGEexp                          html  
    exampleZv9_annot                        html  
    exportToTrack                           html  
    expressionClasses                       html  
    genomeName                              html  
    getCTSS                                 html  
    getExpressionProfiles                   html  
    getShiftingPromoters                    html  
    hanabi-class                            html  
    hanabi                                  html  
REDIRECT:topic   Previous alias or file overwritten by alias: C:/Program Files/R/R-4.1.2/library/00LOCK-CAGEr-master/00new/CAGEr/help/hanabi+2Clist-method.html
    hanabiPlot                              html  
    import.CAGEscanMolecule                 html  
    import.CTSS                             html  
    import.bam                              html  
    import.bam.ctss                         html  
    import.bedCTSS                          html  
    import.bedScore                         html  
    import.bedmolecule                      html  
    inputFiles                              html  
    inputFilesType                          html  
    librarySizes                            html  
    loadFileIntoGPos                        html  
    mapStats                                html  
    mapStatsScopes                          html  
    mergeCAGEsets                           html  
    mergeSamples                            html  
    moleculesGR2CTSS                        html  
    normalizeTagCount                       html  
    parseCAGEscanBlocksToGrangeTSS          html  
    plot.hanabi                             html  
    plotAnnot                               html  
    plotCorrelation                         html  
    plotExpressionProfiles                  html  
    plotInterquantileWidth                  html  
    plotReverseCumulatives                  html  
    powerLaw                                html  
    quantilePositions                       html  
    ranges2annot                            html  
    ranges2genes                            html  
    ranges2names                            html  
    sampleLabels                            html  
    scoreShift                              html  
    seqNameTotalsSE                         html  
    setColors                               html  
    strandInvaders                          html  
    summariseChrExpr                        html  
    tagClusters                             html  
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
*** arch - i386
*** arch - x64
** testing if installed package can be loaded from final location
*** arch - i386
*** arch - x64
** testing if installed package keeps a record of temporary installation path
* DONE (CAGEr)

> library(CAGEr)
载入需要的程辑包:MultiAssayExperiment
载入需要的程辑包:SummarizedExperiment
载入需要的程辑包:MatrixGenerics
载入需要的程辑包:matrixStats

载入程辑包:‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins,
    colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads,
    colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges,
    colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs,
    rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,
    rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads,
    rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars

载入需要的程辑包:GenomicRanges
载入需要的程辑包:stats4
载入需要的程辑包:BiocGenerics

载入程辑包:‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated,
    eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min

载入需要的程辑包:S4Vectors

载入程辑包:‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

载入需要的程辑包:IRanges

载入程辑包:‘IRanges’

The following object is masked from ‘package:grDevices’:

    windows

载入需要的程辑包:GenomeInfoDb
载入需要的程辑包:Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor,
    see 'citation("Biobase")', and for packages 'citation("pkgname")'.

载入程辑包:‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

> getwd()
[1] "C:/Users/PtaYoth/Desktop/cage"
> inputFiles = paste0(getwd(),"/",c("CJC","XXZ"),"/rrna_rm.markdup.bam"); file.exists(inputFiles)
[1] TRUE TRUE
> ce_nepc <- CAGEexp(genomeName = "BSgenome.Hsapiens.UCSC.hg38",
+                    inputFiles = inputFiles,
+                    inputFilesType = "bamPairedEnd",
+                    sampleLabels = c("cjc","xxz")); ce_nepc
A CAGEexp object of 0 listed
 experiments with no user-defined names and respective classes.
 Containing an ExperimentList class object of length 0:
 Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files
> getCTSS(ce_nepc, correctSystematicG = F)

Reading in file: C:/Users/PtaYoth/Desktop/cage/CJC/rrna_rm.markdup.bam...
    -> Filtering out low quality reads...
载入需要的名字空间:BSgenome.Hsapiens.UCSC.hg38
    -> Removing the first base of the reads if 'G' and not aligned to the genome...

Reading in file: C:/Users/PtaYoth/Desktop/cage/XXZ/rrna_rm.markdup.bam...
    -> Filtering out low quality reads...
    -> Removing the first base of the reads if 'G' and not aligned to the genome...
A CAGEexp object of 1 listed
 experiment with a user-defined name and respective class.
 Containing an ExperimentList class object of length 1:
 [1] tagCountMatrix: RangedSummarizedExperiment with 348135 rows and 2 columns
Functionality:
 experiments() - obtain the ExperimentList instance
 colData() - the primary/phenotype DataFrame
 sampleMap() - the sample coordination DataFrame
 `$`, `[`, `[[` - extract colData columns, subset, or experiment
 *Format() - convert into a long or wide DataFrame
 assays() - convert ExperimentList to a SimpleList of matrices
 exportClass() - save data to flat files

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CAGEr_2.1.1                 MultiAssayExperiment_1.20.0 SummarizedExperiment_1.24.0
 [4] Biobase_2.54.0              GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
 [7] IRanges_2.28.0              S4Vectors_0.32.2            BiocGenerics_0.40.0        
[10] MatrixGenerics_1.6.0        matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] nlme_3.1-153                      bitops_1.0-7                      fs_1.5.0                         
 [4] usethis_2.1.3                     devtools_2.4.2                    rprojroot_2.0.2                  
 [7] tools_4.1.2                       utf8_1.2.2                        R6_2.5.1                         
[10] vegan_2.5-7                       KernSmooth_2.23-20                mgcv_1.8-38                      
[13] DBI_1.1.1                         colorspace_2.0-2                  permute_0.9-5                    
[16] withr_2.4.2                       tidyselect_1.1.1                  prettyunits_1.1.1                
[19] processx_3.5.2                    compiler_4.1.2                    cli_3.1.0                        
[22] desc_1.4.0                        DelayedArray_0.20.0               rtracklayer_1.54.0               
[25] scales_1.1.1                      callr_3.7.0                       stringr_1.4.0                    
[28] Rsamtools_2.10.0                  stringdist_0.9.8                  XVector_0.34.0                   
[31] pkgconfig_2.0.3                   sessioninfo_1.2.1                 sparseMatrixStats_1.6.0          
[34] fastmap_1.1.0                     BSgenome_1.62.0                   rlang_0.4.12                     
[37] rstudioapi_0.13                   VGAM_1.1-5                        DelayedMatrixStats_1.16.0        
[40] BiocIO_1.4.0                      generics_0.1.1                    BiocParallel_1.28.0              
[43] gtools_3.9.2                      dplyr_1.0.7                       RCurl_1.98-1.5                   
[46] magrittr_2.0.1                    GenomeInfoDbData_1.2.7            Matrix_1.3-4                     
[49] Rcpp_1.0.7                        munsell_0.5.0                     fansi_0.5.0                      
[52] lifecycle_1.0.1                   stringi_1.7.5                     yaml_2.2.1                       
[55] MASS_7.3-54                       zlibbioc_1.40.0                   pkgbuild_1.2.0                   
[58] plyr_1.8.6                        grid_4.1.2                        formula.tools_1.7.1              
[61] parallel_4.1.2                    crayon_1.4.2                      lattice_0.20-45                  
[64] Biostrings_2.62.0                 splines_4.1.2                     BSgenome.Hsapiens.UCSC.hg38_1.4.4
[67] ps_1.6.0                          pillar_1.6.4                      rjson_0.2.20                     
[70] reshape2_1.4.4                    pkgload_1.2.3                     XML_3.99-0.8                     
[73] glue_1.4.2                        data.table_1.14.2                 remotes_2.4.1                    
[76] operator.tools_1.6.3              vctrs_0.3.8                       testthat_3.1.0                   
[79] gtable_0.3.0                      purrr_0.3.4                       assertthat_0.2.1                 
[82] cachem_1.0.6                      ggplot2_3.3.5                     restfulr_0.0.13                  
[85] tibble_3.1.5                      som_0.3-5.1                       GenomicAlignments_1.30.0         
[88] memoise_2.0.0                     cluster_2.1.2                     ellipsis_0.3.2  
charles-plessy commented 2 years ago

Thank you too; the size-reduced BAM files that you sent me were instrumental in solving that bug.