LabTranslationalArchitectomics / riboWaltz

optimization of ribosome P-site positioning in ribosome profiling data
MIT License
46 stars 12 forks source link

region_psite() error #63

Closed weishwu closed 2 years ago

weishwu commented 2 years ago

I'm getting an error with region_psite:

riboWaltz::region_psite(reads_psite_list, annotation_dt)
Error in as.data.frame.default(x, ...) : 
  cannot coerce class ‘"function"’ to a data.frame

Other functions, such as codon_coverage, rlength_distr and frame_psite_length worked.

I ran psite_info to generate reads_psite_list. The top lines of reads_psite_list and annotation_dt are:

> reads_psite_list
$SRR619082
                 transcript end5 psite end3 length cds_start cds_stop psite_from_start
       1: ENST00000616016.5  284   296  308     25       510     3044             -214
       2: ENST00000616016.5  284   296  308     25       510     3044             -214
       3: ENST00000616016.5  834   846  858     25       510     3044              336
       4: ENST00000616016.5 1287  1299 1311     25       510     3044              789
       5: ENST00000616016.5 2346  2358 2370     25       510     3044             1848
      ---                      

> head(annotation_dt)
           transcript l_tr l_utr5 l_cds l_utr3
1: ENST00000000233.10 1032     88   543    401
2:  ENST00000000412.8 2450    159   834   1457
3: ENST00000000442.11 2274    225  1272    777
4:  ENST00000001008.6 3715    170  1380   2165
5:  ENST00000001146.7 4556     28  1539   2989
6:  ENST00000002125.9 2184     48  1326    810

annotation_dt was generated from Gencode GTF and subsetted to protein-coding transcripts:

> annotation_dt_all <- create_annotation('../../ref_data/gencode.v42.annotation.gtf')
> pc_trx <- read.table('../../ref_data/gencode.v42.pc_transcripts.fa.transcript_ids.txt')
> annotation_dt <- annotation_dt_all %>% filter(transcript %in% pc_trx[,1])
> class(annotation_dt)
[1] "data.table" "data.frame"
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1   forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.5    
 [6] readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.4.0   tidyverse_1.3.2
[11] riboWaltz_1.2.0

loaded via a namespace (and not attached):
 [1] matrixStats_0.62.0          bitops_1.0-7                fs_1.5.2                   
 [4] lubridate_1.8.0             bit64_4.0.5                 filelock_1.0.2             
 [7] progress_1.2.2              httr_1.4.4                  GenomeInfoDb_1.34.3        
[10] tools_4.2.0                 backports_1.4.1             utf8_1.2.2                 
[13] R6_2.5.1                    DBI_1.1.3                   BiocGenerics_0.44.0        
[16] colorspace_2.0-3            withr_2.5.0                 tidyselect_1.2.0           
[19] prettyunits_1.1.1           bit_4.0.5                   curl_4.3.3                 
[22] compiler_4.2.0              cli_3.4.1                   rvest_1.0.3                
[25] Biobase_2.58.0              xml2_1.3.3                  DelayedArray_0.24.0        
[28] labeling_0.4.2              rtracklayer_1.58.0          scales_1.2.1               
[31] rappdirs_0.3.3              Rsamtools_2.14.0            digest_0.6.30              
[34] XVector_0.38.0              pkgconfig_2.0.3             MatrixGenerics_1.10.0      
[37] dbplyr_2.2.1                fastmap_1.1.0               rlang_1.0.6                
[40] readxl_1.4.1                rstudioapi_0.13             RSQLite_2.2.18             
[43] farver_2.1.1                BiocIO_1.8.0                generics_0.1.3             
[46] jsonlite_1.8.3              BiocParallel_1.32.1         googlesheets4_1.0.1        
[49] RCurl_1.98-1.9              magrittr_2.0.3              GenomeInfoDbData_1.2.9     
[52] Matrix_1.4-1                Rcpp_1.0.9                  munsell_0.5.0              
[55] S4Vectors_0.36.0            fansi_1.0.3                 lifecycle_1.0.3            
[58] yaml_2.3.6                  stringi_1.7.8               SummarizedExperiment_1.28.0
[61] zlibbioc_1.44.0             BiocFileCache_2.6.0         grid_4.2.0                 
[64] blob_1.2.3                  parallel_4.2.0              crayon_1.5.2               
[67] lattice_0.20-45             Biostrings_2.66.0           haven_2.5.1                
[70] GenomicFeatures_1.50.2      hms_1.1.2                   KEGGREST_1.38.0            
[73] pillar_1.8.1                GenomicRanges_1.50.1        rjson_0.2.21               
[76] codetools_0.2-18            biomaRt_2.54.0              stats4_4.2.0               
[79] reprex_2.0.2                XML_3.99-0.12               glue_1.6.2                 
[82] data.table_1.14.6           modelr_0.1.9                png_0.1-7                  
[85] vctrs_0.5.1                 tzdb_0.3.0                  cellranger_1.1.0           
[88] gtable_0.3.1                assertthat_0.2.1            cachem_1.0.6               
[91] broom_1.0.1                 restfulr_0.0.15             googledrive_2.0.0          
[94] gargle_1.2.1                GenomicAlignments_1.34.0    AnnotationDbi_1.60.0       
[97] memoise_2.0.1               IRanges_2.32.0              ellipsis_0.3.2   

data.zip

fabiolauria commented 2 years ago

Hi there, thank you for using riboWaltz and for the well-described issue.

I run the following code, based on the data you attached above and on the latest riboWaltz version in GitHub:

annotation_dt <- readRDS("PATH/annotation_dt.rds")
reads_psite_list <- readRDS("PATH/reads_psite_list_test.rds")
test <- region_psite(reads_psite_list, annotation_dt) 

Aside from a warning ggplot-related, I obtained this:

> test
$plot

$dt
   region     count    sample percentage  class
1: 5' UTR         4 SRR619082  33.333333 mapped
2: 5' UTR         2 SRR619083  10.000000 mapped
3: 5' UTR  17132481      RNAs   7.612826    rna
4:    CDS         8 SRR619082  66.666667 mapped
5:    CDS        18 SRR619083  90.000000 mapped
6:    CDS 106949646      RNAs  47.523129    rna
7: 3' UTR 100965443      RNAs  44.864045    rna

and the following plot:

image

This is exactly what i would expect from the region_psite function. I was not able to replicate the error you reported, even unloading some packages. Did you try to run region_psite separately on the two samples? Did the error pop up every time? If so, the only option is for you to send me the whole reads_psite_list so I can try again, see if I get the same error and, in case, proceed step by step to find out the reason of it.

Here info on my R version

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252    LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C                   LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.4.0     data.table_1.14.2

loaded via a namespace (and not attached):
 [1] fansi_0.4.1      withr_2.5.0      dplyr_1.0.2      utf8_1.1.4       crayon_1.3.4     grid_4.0.3       R6_2.5.0         lifecycle_1.0.3  gtable_0.3.0    
[10] magrittr_2.0.1   scales_1.2.1     pillar_1.7.0     rlang_1.0.6      cli_3.4.1        farver_2.0.3     rstudioapi_0.13  generics_0.1.0   vctrs_0.5.1     
[19] ellipsis_0.3.2   labeling_0.4.2   tools_4.0.3      glue_1.4.2       purrr_0.3.4      munsell_0.5.0    compiler_4.0.3   pkgconfig_2.0.3  colorspace_2.0-0
[28] tidyselect_1.1.2 tibble_3.0.4

Best Fabio

weishwu commented 2 years ago

Thanks for looking into this @fabiolauria I tried it in R and it works! Don't know why it doesn't work in Rstudio.