LabTranslationalArchitectomics / riboWaltz

optimization of ribosome P-site positioning in ribosome profiling data
MIT License
46 stars 12 forks source link

strange bug in region_psite() #40

Closed yeroslaviz closed 3 years ago

yeroslaviz commented 3 years ago

When running teh command I get this strange error:

> example_psite_region <- region_psite(data = reads_psite_list, annotation = annotation_dt, sample = "IPA")
Error in as.data.frame.default(x, ...) : 
  cannot coerce class ‘"function"’ to a data.frame

But this can't be, as both input objects are correct:

> class(annotation_dt)
[1] "data.table" "data.frame"
> head(annotation_dt, 3)
   transcript l_tr l_utr5 l_cds l_utr3
1:    2RSSE.7  192    123    69      0
2:   6R55.2.1  296    152   144      0
3:   AC3.10.1 1224    198   999     27

> lapply(reads_psite_list, head, n=3)
$IPA
   transcript end5 psite end3 length cds_start cds_stop psite_from_start psite_from_stop psite_region
1:  F15H9.2.1 1035   345 1046     12        58     1056            -2056           -3186         5utr
2:  F15H9.2.1 1035   345 1046     12        58     1056            -2056           -3186         5utr
3:  F15H9.2.1 1035   345 1046     12        58     1056            -2056           -3186         5utr

$IPB
   transcript end5 psite end3 length cds_start cds_stop psite_from_start psite_from_stop psite_region
1:  C09D4.6.1  257    46  268     12        58      432              -27            -488         5utr
2:  F15H9.2.1 1035    46 1046     12        58     1056              -27            -488         5utr
3:  T06D4.4.1 1966    46 1977     12       553     2505              -27            -488         5utr

$TotalA
   transcript end5 psite end3 length cds_start cds_stop psite_from_start psite_from_stop psite_region
1:  F15H9.2.1 1035   229 1046     12        58     1056            -2172           -3302         5utr
2:  F15H9.2.1 1035   229 1046     12        58     1056            -2172           -3302         5utr
3:  F15H9.2.1 1035   229 1046     12        58     1056            -2172           -3302         5utr

$TotalB
     transcript end5 psite end3 length cds_start cds_stop psite_from_start psite_from_stop psite_region
1:    F15H9.2.1 1035  2737 1046     12        58     1056              336            -794          cds
2:    F15H9.2.1 1035  2737 1046     12        58     1056              336            -794          cds
3: Y46G5A.22a.1  718    49  729     12       621      929              -24            -485         5utr

The same error happens also when running the example code for the command itself.

any Ideas?

Assa

yeroslaviz commented 3 years ago
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       macOS Big Sur 10.16         
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Berlin               
 date     2021-07-16                  

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source                                                   
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.1.0)                                           
 backports              1.2.1    2020-12-09 [1] CRAN (R 4.1.0)                                           
 Biobase                2.52.0   2021-05-19 [1] Bioconductor                                             
 BiocGenerics         * 0.38.0   2021-05-19 [1] Bioconductor                                             
 BiocParallel           1.26.1   2021-07-04 [1] Bioconductor                                             
 Biostrings           * 2.60.1   2021-06-06 [1] Bioconductor                                             
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.0)                                           
 broom                  0.7.8    2021-06-24 [1] CRAN (R 4.1.0)                                           
 bslib                  0.2.5.1  2021-05-18 [1] CRAN (R 4.1.0)                                           
 cachem                 1.0.5    2021-05-15 [1] CRAN (R 4.1.0)                                           
 callr                  3.7.0    2021-04-20 [1] CRAN (R 4.1.0)                                           
 cellranger             1.1.0    2016-07-27 [1] CRAN (R 4.1.0)                                           
 class                  7.3-19   2021-05-03 [1] CRAN (R 4.1.0)                                           
 classInt               0.4-3    2020-04-07 [1] CRAN (R 4.1.0)                                           
 cli                    3.0.0    2021-06-30 [1] CRAN (R 4.1.0)                                           
 colorspace             2.0-2    2021-06-24 [1] CRAN (R 4.1.0)                                           
 cowplot                1.1.1    2020-12-30 [1] CRAN (R 4.1.0)                                           
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.0)                                           
 data.table           * 1.14.0   2021-02-21 [1] CRAN (R 4.1.0)                                           
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.1.0)                                           
 dbplyr                 2.1.1    2021-04-06 [1] CRAN (R 4.1.0)                                           
 DelayedArray           0.18.0   2021-05-19 [1] Bioconductor                                             
 desc                   1.3.0    2021-03-05 [1] CRAN (R 4.1.0)                                           
 devtools             * 2.4.2    2021-06-07 [1] CRAN (R 4.1.0)                                           
 digest                 0.6.27   2020-10-24 [1] CRAN (R 4.1.0)                                           
 dplyr                * 1.0.7    2021-06-18 [1] CRAN (R 4.1.0)                                           
 e1071                  1.7-7    2021-05-23 [1] CRAN (R 4.1.0)                                           
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.0)                                           
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.1.0)                                           
 fansi                  0.5.0    2021-05-25 [1] CRAN (R 4.1.0)                                           
 farver                 2.1.0    2021-02-28 [1] CRAN (R 4.1.0)                                           
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.0)                                           
 forcats              * 0.5.1    2021-01-27 [1] CRAN (R 4.1.0)                                           
 formatR                1.11     2021-06-01 [1] CRAN (R 4.1.0)                                           
 fs                     1.5.0    2020-07-31 [1] CRAN (R 4.1.0)                                           
 futile.logger        * 1.4.3    2016-07-10 [1] CRAN (R 4.1.0)                                           
 futile.options         1.0.1    2018-04-20 [1] CRAN (R 4.1.0)                                           
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.1.0)                                           
 GenomeInfoDb         * 1.28.1   2021-07-01 [1] Bioconductor                                             
 GenomeInfoDbData       1.2.6    2021-06-15 [1] Bioconductor                                             
 GenomicAlignments      1.28.0   2021-05-19 [1] Bioconductor                                             
 GenomicRanges          1.44.0   2021-05-19 [1] Bioconductor                                             
 ggplot2              * 3.3.5    2021-06-25 [1] CRAN (R 4.1.0)                                           
 ggVennDiagram        * 1.1.0    2021-05-07 [1] CRAN (R 4.1.0)                                           
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.0)                                           
 gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.1.0)                                           
 haven                  2.4.1    2021-04-23 [1] CRAN (R 4.1.0)                                           
 highr                  0.9      2021-04-16 [1] CRAN (R 4.1.0)                                           
 hms                    1.1.0    2021-05-17 [1] CRAN (R 4.1.0)                                           
 htmltools              0.5.1.1  2021-01-22 [1] CRAN (R 4.1.0)                                           
 httr                   1.4.2    2020-07-20 [1] CRAN (R 4.1.0)                                           
 IRanges              * 2.26.0   2021-05-19 [1] Bioconductor                                             
 jquerylib              0.1.4    2021-04-26 [1] CRAN (R 4.1.0)                                           
 jsonlite               1.7.2    2020-12-09 [1] CRAN (R 4.1.0)                                           
 kableExtra           * 1.3.4    2021-02-20 [1] CRAN (R 4.1.0)                                           
 KernSmooth             2.23-20  2021-05-03 [1] CRAN (R 4.1.0)                                           
 knitr                  1.33     2021-04-24 [1] CRAN (R 4.1.0)                                           
 labeling               0.4.2    2020-10-20 [1] CRAN (R 4.1.0)                                           
 lambda.r               1.2.4    2019-09-18 [1] CRAN (R 4.1.0)                                           
 lattice                0.20-44  2021-05-02 [1] CRAN (R 4.1.0)                                           
 lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.1.0)                                           
 lubridate              1.7.10   2021-02-26 [1] CRAN (R 4.1.0)                                           
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.1.0)                                           
 Matrix                 1.4-0    2021-06-11 [1] R-Forge (R 4.1.0)                                        
 MatrixGenerics         1.4.0    2021-05-19 [1] Bioconductor                                             
 matrixStats            0.59.0   2021-06-01 [1] CRAN (R 4.1.0)                                           
 memoise                2.0.0    2021-01-26 [1] CRAN (R 4.1.0)                                           
 modelr                 0.1.8    2020-05-19 [1] CRAN (R 4.1.0)                                           
 munsell                0.5.0    2018-06-12 [1] CRAN (R 4.1.0)                                           
 pillar                 1.6.1    2021-05-16 [1] CRAN (R 4.1.0)                                           
 pkgbuild               1.2.0    2020-12-15 [1] CRAN (R 4.1.0)                                           
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.0)                                           
 pkgload                1.2.1    2021-04-06 [1] CRAN (R 4.1.0)                                           
 prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.1.0)                                           
 processx               3.5.2    2021-04-30 [1] CRAN (R 4.1.0)                                           
 proxy                  0.4-26   2021-06-07 [1] CRAN (R 4.1.0)                                           
 ps                     1.6.0    2021-02-28 [1] CRAN (R 4.1.0)                                           
 purrr                * 0.3.4    2020-04-17 [1] CRAN (R 4.1.0)                                           
 R6                     2.5.0    2020-10-28 [1] CRAN (R 4.1.0)                                           
 Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.1.0)                                           
 RCurl                  1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)                                           
 readr                * 1.4.0    2020-10-05 [1] CRAN (R 4.1.0)                                           
 readxl                 1.3.1    2019-03-13 [1] CRAN (R 4.1.0)                                           
 remotes                2.4.0    2021-06-02 [1] CRAN (R 4.1.0)                                           
 reprex                 2.0.0    2021-04-02 [1] CRAN (R 4.1.0)                                           
 riboWaltz            * 1.1.0    2021-07-13 [1] Github (LabTranslationalArchitectomics/riboWaltz@d4bb8de)
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.0)                                           
 rmarkdown              2.9      2021-06-15 [1] CRAN (R 4.1.0)                                           
 rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.1.0)                                           
 Rsamtools              2.8.0    2021-05-19 [1] Bioconductor                                             
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.0)                                           
 RVenn                  1.1.0    2019-07-18 [1] CRAN (R 4.1.0)                                           
 rvest                  1.0.0    2021-03-09 [1] CRAN (R 4.1.0)                                           
 S4Vectors            * 0.30.0   2021-05-19 [1] Bioconductor                                             
 sass                   0.4.0    2021-05-12 [1] CRAN (R 4.1.0)                                           
 scales                 1.1.1    2020-05-11 [1] CRAN (R 4.1.0)                                           
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.0)                                           
 sf                     1.0-0    2021-06-09 [1] CRAN (R 4.1.0)                                           
 stringi                1.6.2    2021-05-17 [1] CRAN (R 4.1.0)                                           
 stringr              * 1.4.0    2019-02-10 [1] CRAN (R 4.1.0)                                           
 SummarizedExperiment   1.22.0   2021-05-19 [1] Bioconductor                                             
 svglite                2.0.0    2021-02-20 [1] CRAN (R 4.1.0)                                           
 systemfonts            1.0.2    2021-05-11 [1] CRAN (R 4.1.0)                                           
 testthat               3.0.4    2021-07-01 [1] CRAN (R 4.1.0)                                           
 tibble               * 3.1.2    2021-05-16 [1] CRAN (R 4.1.0)                                           
 tidyr                * 1.1.3    2021-03-03 [1] CRAN (R 4.1.0)                                           
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.1.0)                                           
 tidyverse            * 1.3.1    2021-04-15 [1] CRAN (R 4.1.0)                                           
 units                  0.7-2    2021-06-08 [1] CRAN (R 4.1.0)                                           
 usethis              * 2.0.1    2021-02-10 [1] CRAN (R 4.1.0)                                           
 utf8                   1.2.1    2021-03-12 [1] CRAN (R 4.1.0)                                           
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.0)                                           
 VennDiagram          * 1.6.20   2018-03-28 [1] CRAN (R 4.1.0)                                           
 viridisLite            0.4.0    2021-04-13 [1] CRAN (R 4.1.0)                                           
 webshot                0.5.2    2019-11-22 [1] CRAN (R 4.1.0)                                           
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.0)                                           
 xfun                   0.24     2021-06-15 [1] CRAN (R 4.1.0)                                           
 xml2                   1.3.2    2020-04-23 [1] CRAN (R 4.1.0)                                           
 XVector              * 0.32.0   2021-05-19 [1] Bioconductor                                             
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.0)                                           
 zlibbioc               1.38.0   2021-05-19 [1] Bioconductor       
fabiolauria commented 3 years ago

Hi Assa. As for the previous issue I run _regionpsite on data from my lab and it worked fine. It seems clear to me that the identification of the psites went wrong, since their position within the reads is clearly incorrect. The psite should fall between end5 and end3 while you have, for IPA sample:

end5 = 1035 psite = 345 end3 = 1046

and for TotalB:

end5 = 1035
psite = 2737 end3 = 1046

and a similar situation for the other samples. It might or might not be the reason of the error, but for sure you cannot proceed with this data. Moreover, looking at the fields _cdsstart, _cdsstop _psite_fromstart and _psite_fromstop the data tables seem to have been built on multiple and different reference data. Or the values in specific columns have been somehow shuffled. It's the first time I see something like this so I'm pretty sure it doesn't depend on riboWaltz. As I already mentioned, it is more likely that passing to riboWaltz FASTA and GTF files that do not correspond to the ones used in the alignment have introduced many errors related to the length of the transcripts as well as the beginning and the end of the CDS.

BTW your reads are a bit too short for a riboSeq experiment, according to the few rows displayed above. I don't think 12 nucleotides are representative of ribosome footprint, even if we are not talking of eukaryotes organisms. And this should be unrelated to riboWaltz, unless even the reading of the BAMs went wrong (you can easily verify this looking at BAM-derived BED files).

I'm sorry for all the inconvenient but there are many things that don't add up. Please check carefully your data and files. If you think everything is fine with them, I'll ask you your annotations and chunk of BAM and FASTA files to try and find out wht is going on.

Let me know.

Best Fabio

fabiolauria commented 3 years ago

Hi, I'm going to close this issue. Let me know if you need more help.

Fabio