Closed GuidoBarzaghi closed 3 years ago
@GuidoBarzaghi I have tried to repruduce this, but could not:
library(QuasR)
td <- tempdir()
file.copy(system.file("extdata", package = "QuasR"), td, recursive=TRUE)
samplefile <- file.path(td, "extdata", "samples_chip_single.txt")
genomefile <- file.path(td, "extdata", "hg19sub.fa")
p1 <- qAlign(samplefile, genomefile)
p2 <- qAlign(samplefile, genomefile, bisulfite = "undir")
samplefile2 <- file.path(td, "extdata", "samples_rna_paired.txt")
p3 <- qAlign(samplefile2, genomefile, bisulfite = "undir")
p4 <- qAlign(samplefile2, genomefile, bisulfite = "undir", paired = "fr")
p1@aligner
p2@aligner
p3@aligner
p4@aligner
All four projects have the aligner
slot set correctly. Could you provide more details what you mean by "doesn't retain the aligner information", ideally with a reproducible example code and session information?
My tests were done using the most recent release of QuasR
:
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /tungstenfs/groups/gbioinfo/Appz/easybuild/software/OpenBLAS/0.3.7-GCC-8.3.0/lib/libopenblas_skylakex-r0.3.7.so
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] QuasR_1.30.0 Rbowtie_1.30.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.2 IRanges_2.24.1 S4Vectors_0.28.1
[7] BiocGenerics_0.36.0 RColorBrewer_1.1-2
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.2.0 Biobase_2.50.0 httr_1.4.2 bit64_4.0.5
[5] GenomicFiles_1.26.0 assertthat_0.2.1 askpass_1.1 BiocManager_1.30.10
[9] BiocFileCache_1.14.0 latticeExtra_0.6-29 blob_1.2.1 BSgenome_1.58.0
[13] GenomeInfoDbData_1.2.4 Rsamtools_2.6.0 progress_1.2.2 pillar_1.4.7
[17] RSQLite_2.2.3 lattice_0.20-41 glue_1.4.2 XVector_0.30.0
[21] Matrix_1.2-18 XML_3.99-0.5 pkgconfig_2.0.3 ShortRead_1.48.0
[25] biomaRt_2.46.1 zlibbioc_1.36.0 purrr_0.3.4 jpeg_0.1-8.1
[29] BiocParallel_1.24.1 tibble_3.0.5 openssl_1.4.3 generics_0.1.0
[33] ellipsis_0.3.1 cachem_1.0.1 SummarizedExperiment_1.20.0 GenomicFeatures_1.42.1
[37] fortunes_1.5-4 magrittr_2.0.1 crayon_1.3.4 memoise_2.0.0
[41] xml2_1.3.2 hwriter_1.3.2 tools_4.0.3 prettyunits_1.1.1
[45] hms_1.0.0 lifecycle_0.2.0 matrixStats_0.57.0 stringr_1.4.0
[49] DelayedArray_0.16.1 AnnotationDbi_1.52.0 Biostrings_2.58.0 compiler_4.0.3
[53] rlang_0.4.10 grid_4.0.3 RCurl_1.98-1.2 VariantAnnotation_1.36.0
[57] rappdirs_0.3.1 bitops_1.0-6 DBI_1.1.1 curl_4.3
[61] R6_2.5.0 GenomicAlignments_1.26.0 dplyr_1.0.3 rtracklayer_1.50.0
[65] fastmap_1.1.0 bit_4.0.4 stringi_1.5.3 Rcpp_1.0.6
[69] vctrs_0.3.6 png_0.1-7 dbplyr_2.0.0 tidyselect_1.1.0
Thanks a lot for your exceptionally quick reply. I did try to update QuasR to its latest version but that did not solve it.
This is how I align the data
prj <- qAlign(sampleFile=INPUT_FILE_NAME,
genome="BSgenome.Mmusculus.UCSC.mm10",
aligner = "Rbowtie",
paired="fr",
bisulfite="undir",
projectName="prj",
alignmentParameter = "-e 70 -X 1000 -k 2 --best -strata",
alignmentsDir="./",
snpFile = SNPs,
clObj = cl)
And this is how I define a prj for downstream analysis
prj <- qAlign(sampleFile=INPUT_FILE_NAME,
genome="BSgenome.Mmusculus.UCSC.mm10",
aligner = "Rbowtie",
paired="fr",
bisulfite="undir",
projectName="prj",
checkOnly=F)
Left like this prj@aligner
equals to NA
sessionInfo()
R Under development (unstable) (2021-01-17 r79842)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /g/funcgen/gbcs/public/software/rstudio-server/R-devel-2021-01-19/lib64/R/lib/libRblas.so
LAPACK: /g/funcgen/gbcs/public/software/rstudio-server/R-devel-2021-01-19/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.59.2 rtracklayer_1.51.4 Biostrings_2.59.2 XVector_0.31.1 QuasR_1.31.0 Rbowtie_1.31.6
[8] GenomicRanges_1.43.3 GenomeInfoDb_1.27.3 IRanges_2.25.6 S4Vectors_0.29.6 BiocGenerics_0.37.0
loaded via a namespace (and not attached):
[1] fs_1.5.0 bitops_1.0-6 matrixStats_0.57.0 usethis_2.0.0 devtools_2.3.2 bit64_4.0.5 filelock_1.0.2 RColorBrewer_1.1-2
[9] progress_1.2.2 httr_1.4.2 rprojroot_2.0.2 GenomicFiles_1.27.0 tools_4.1.0 R6_2.5.0 DBI_1.1.1 withr_2.3.0
[17] processx_3.4.5 tidyselect_1.1.0 prettyunits_1.1.1 bit_4.0.4 curl_4.3 compiler_4.1.0 cli_2.2.0 Biobase_2.51.0
[25] xml2_1.3.2 desc_1.2.0 DelayedArray_0.17.7 callr_3.5.1 askpass_1.1 rappdirs_0.3.1 stringr_1.4.0 digest_0.6.27
[33] Rsamtools_2.7.0 rmarkdown_2.6 jpeg_0.1-8.1 pkgconfig_2.0.3 htmltools_0.5.0 sessioninfo_1.1.1 MatrixGenerics_1.3.0 dbplyr_2.0.0
[41] rlang_0.4.10 rstudioapi_0.13 RSQLite_2.2.2 BiocIO_1.1.2 generics_0.1.0 hwriter_1.3.2 BiocParallel_1.25.2 dplyr_1.0.3
[49] VariantAnnotation_1.37.1 RCurl_1.98-1.2 magrittr_2.0.1 GenomeInfoDbData_1.2.4 Matrix_1.3-2 Rcpp_1.0.5 fansi_0.4.2 lifecycle_0.2.0
[57] stringi_1.5.3 yaml_2.2.1 SummarizedExperiment_1.21.1 zlibbioc_1.37.0 pkgbuild_1.2.0 BiocFileCache_1.15.1 grid_4.1.0 blob_1.2.1
[65] crayon_1.3.4 lattice_0.20-41 GenomicFeatures_1.43.3 hms_1.0.0 ps_1.5.0 knitr_1.30 pillar_1.4.7 rjson_0.2.20
[73] pkgload_1.1.0 biomaRt_2.47.1 XML_3.99-0.5 glue_1.4.2 evaluate_0.14 ShortRead_1.49.1 latticeExtra_0.6-29 remotes_2.2.0
[81] BiocManager_1.30.10 vctrs_0.3.6 png_0.1-7 testthat_3.0.1 openssl_1.4.3 purrr_0.3.4 assertthat_0.2.1 xfun_0.20
[89] restfulr_0.0.13 tibble_3.0.5 GenomicAlignments_1.27.2 AnnotationDbi_1.53.0 memoise_1.1.0 ellipsis_0.3.1
Ok, I think I understand now. I think this is simply because you use two different ways to define prf
. The idea is that you use the exact same qAlign
-call also for downstream analysis. QuasR
should identify pre-existing bam files and would not re-create them.
Could you check if prj@aligner
is set correctly in the first case?
Ah I see. You are exactly right, that worked. Thanks a lot for your support :) In that case though, when coming to QuasR directly for downstream analyses without knowing what exact call was used for alignments, the aligner must still be specified separately
I cannot run your code examples, for example I don't have the file INPUT_FILE_NAME
.
Could you provide a reproducible example?
My guess is that maybe the samples file contains bam files, and thus the aligner slot is not set.
Exactly, sorry for not being clear.
When starting a QuasR project with aligned bam files for which I do not have the original fastqs, the aligner slot must be filled separately (that's all I wanted to understand).
Is allowing to define the aligner directly in the qAlign call in this case as well something you might contemplate?
Actually, what you experience is expected behaviour: For a "bamfile-project", QuasR
has no way to validate the aligner, and therefore chooses not to set that field. The field is only set if QuasR
can guarantee its correctness.
I think setting it from the qAlign(..., aligner)
argument would often do the wrong thing - aligner
defaults to "Rbowtie" and would thus set it to that value even if the bam files were generated using another aligner.
You can of course set the @aligner
slot as you have done above, or preferentially, create your project using the original qAlign
call.
Thank you very much for the clarification
I'm not sure whether this is a desired behaviour or not, but I've noticed that calling qAlign as follows on aligned data results on a QuasRprj object which doesn't retain the aligner information
prj <- qAlign(sampleFile=QuasR_pointer, genome="BSgenome.Mmusculus.UCSC.mm10", aligner = "Rbowtie", paired="fr", bisulfite="undir", projectName="prj", checkOnly=F)
Later reassignment as
prj@aligner = "Rbowtie"
readily solves the issue anyway.