Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

ATACqc #332

Closed jianhong closed 7 years ago

jianhong commented 7 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 7 years ago

Hi @jianhong

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ATACqc
Type: Package
Title: ATAC sequencing Quality Control
Version: 0.99.0
Author: Jianhong Ou, Jun Yu, Nathan Lawson, Lihua Julie Zhu
Maintainer: Jianhong Ou <jianhong.ou@umassmed.edu>
Description: Assay for Transposase-Accessible Chromatin using sequencing 
    (ATAC-seq) is a alternative or complementary technique to MNase-seq 
    (sequencing of micrococcal nuclease sensitive sites). It is rapid and
    sensitive method for chromatin accessibility analysis. We collect 
    codes for ATAC-seq analysis following the report of Greenleaf Lab,
    in order to help users to do quick quaulity control for their data,
    which including fragment size distribution, nucleosome positioning,
    and CTCF or other Transcript Factor footprints.
Depends:
    R (>= 3.4),
    IRanges,
    GenomicRanges,
    S4Vectors
Imports:
    BSgenome,
    Biostrings,
    ChIPpeakAnno,
    GenomicFeatures,
    GenomicAlignments,
    GenomeInfoDb,
    graphics,
    motifStack,
    Rsamtools,
    SummarizedExperiment,
    VariantFiltering,
    gtools,
    limma,
    randomForest,
    rtracklayer,
    stats,
    stringr,
    trackViewer,
    grid
Suggests:
    RUnit,
    BiocStyle,
    knitr,
    BSgenome.Hsapiens.UCSC.hg19,
    TxDb.Hsapiens.UCSC.hg19.knownGene,
    phastCons100way.UCSC.hg19,
    MotifDb
License: GPL (>= 2)
LazyData: TRUE
VignetteBuilder: knitr
RoxygenNote: 6.0.1
biocViews: Sequencing, DNASeq, GeneRegulation, QualityControl, 
 Coverage, NucleosomePositioning
bioc-issue-bot commented 7 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170327182116.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

b3ff4c7 Try to fix the bug "object inverted is not exporte...

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170327231002.html

jianhong commented 7 years ago

Dear @hpages ,

I have trouble in debug the ERROR. Could you help me to figure what is the problem? Thank you.

I tried two methods, one is update my R to most recently daily build and another one is bioconductor/devel_core2:latest docker. Here is the sessionInfo of my development environment:

local:

R Under development (unstable) (2017-03-27 r72423) Platform: x86_64-apple-darwin16.4.0 (64-bit) Running under: macOS Sierra 10.12.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] ATACqc_0.99.1 GenomicRanges_1.27.23 GenomeInfoDb_1.11.9
[4] IRanges_2.9.19 S4Vectors_0.13.15 BiocGenerics_0.21.3
[7] devtools_1.12.0

loaded via a namespace (and not attached): [1] ProtGenerics_1.7.0 bitops_1.0-6
[3] matrixStats_0.51.0 RColorBrewer_1.1-2
[5] httr_1.2.1 backports_1.0.5
[7] tools_3.5.0 rGADEM_2.23.0
[9] R6_2.2.0 rpart_4.1-11
[11] seqLogo_1.41.0 Hmisc_4.0-2
[13] DBI_0.6 Gviz_1.19.3
[15] lazyeval_0.2.0 colorspace_1.3-2
[17] ade4_1.7-6 nnet_7.3-12
[19] motifStack_1.19.2 withr_1.0.2
[21] gridExtra_2.2.1 compiler_3.5.0
[23] git2r_0.18.0 VennDiagram_1.6.17
[25] graph_1.53.0 Biobase_2.35.1
[27] htmlTable_1.9 grImport_0.9-0
[29] DelayedArray_0.1.7 rtracklayer_1.35.9
[31] scales_0.4.1 checkmate_1.8.2
[33] randomForest_4.6-12 pbapply_1.3-2
[35] RBGL_1.51.0 stringr_1.2.0
[37] digest_0.6.12 Rsamtools_1.27.13
[39] foreign_0.8-67 XVector_0.15.2
[41] dichromat_2.0-0 base64enc_0.1-3
[43] htmltools_0.3.5 ensembldb_1.99.12
[45] limma_3.31.19 BSgenome_1.43.7
[47] regioneR_1.7.3 htmlwidgets_0.8
[49] RSQLite_1.1-2 BiocInstaller_1.25.3
[51] shiny_1.0.0 gtools_3.5.0
[53] BiocParallel_1.9.5 acepack_1.4.1
[55] magrittr_1.5 VariantAnnotation_1.21.17
[57] RCurl_1.95-4.8 GO.db_3.4.0
[59] GenomeInfoDbData_0.99.0 Formula_1.2-1
[61] futile.logger_1.4.3 Matrix_1.2-9
[63] trackViewer_1.11.6 Rcpp_0.12.10
[65] munsell_0.4.3 stringi_1.1.3
[67] yaml_2.1.14 MASS_7.3-47
[69] SummarizedExperiment_1.5.7 zlibbioc_1.21.0
[71] plyr_1.8.4 AnnotationHub_2.7.14
[73] grid_3.5.0 lattice_0.20-35
[75] Biostrings_2.43.6 splines_3.5.0
[77] multtest_2.31.0 GenomicFeatures_1.27.10
[79] knitr_1.15.1 MotIV_1.31.0
[81] seqinr_3.3-3 biomaRt_2.31.4
[83] futile.options_1.0.0 XML_3.98-1.5
[85] biovizBase_1.23.2 latticeExtra_0.6-28
[87] data.table_1.10.4 lambda.r_1.1.9
[89] idr_1.2 httpuv_1.3.3
[91] gtable_0.2.0 assertthat_0.1
[93] ggplot2_2.2.1 mime_0.5
[95] xtable_1.8-2 survival_2.41-2
[97] ChIPpeakAnno_3.9.16 tibble_1.2
[99] GenomicAlignments_1.11.12 AnnotationDbi_1.37.4
[101] memoise_1.0.0 cluster_2.0.6
[103] interactiveDisplayBase_1.13.0 VariantFiltering_1.11.4

docker:

R Under development (unstable) (2017-03-26 r72404) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default BLAS: /usr/local/lib/R/lib/libRblas.so LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] ATACqc_0.99.1 GenomicRanges_1.27.23 GenomeInfoDb_1.11.9
[4] IRanges_2.9.19 S4Vectors_0.13.15 BiocGenerics_0.21.3
[7] devtools_1.12.0 BiocInstaller_1.25.3

loaded via a namespace (and not attached): [1] ProtGenerics_1.7.0 bitops_1.0-6
[3] matrixStats_0.51.0 RColorBrewer_1.1-2
[5] httr_1.2.1 backports_1.0.5
[7] tools_3.5.0 rGADEM_2.23.0
[9] R6_2.2.0 rpart_4.1-10
[11] seqLogo_1.41.0 Hmisc_4.0-2
[13] DBI_0.6 Gviz_1.19.3
[15] lazyeval_0.2.0 colorspace_1.3-2
[17] ade4_1.7-6 nnet_7.3-12
[19] motifStack_1.19.2 withr_1.0.2
[21] gridExtra_2.2.1 curl_2.4
[23] compiler_3.5.0 git2r_0.18.0
[25] VennDiagram_1.6.17 graph_1.53.0
[27] Biobase_2.35.1 htmlTable_1.9
[29] grImport_0.9-0 DelayedArray_0.1.7
[31] rtracklayer_1.35.9 scales_0.4.1
[33] checkmate_1.8.2 randomForest_4.6-12
[35] pbapply_1.3-2 RBGL_1.51.0
[37] stringr_1.2.0 digest_0.6.12
[39] Rsamtools_1.27.13 foreign_0.8-67
[41] XVector_0.15.2 dichromat_2.0-0
[43] base64enc_0.1-3 htmltools_0.3.5
[45] ensembldb_1.99.12 limma_3.31.19
[47] BSgenome_1.43.7 regioneR_1.7.3
[49] htmlwidgets_0.8 RSQLite_1.1-2
[51] shiny_1.0.0 gtools_3.5.0
[53] BiocParallel_1.9.5 acepack_1.4.1
[55] magrittr_1.5 VariantAnnotation_1.21.17
[57] RCurl_1.95-4.8 GO.db_3.4.0
[59] GenomeInfoDbData_0.99.0 Formula_1.2-1
[61] futile.logger_1.4.3 Matrix_1.2-8
[63] trackViewer_1.11.6 Rcpp_0.12.10
[65] munsell_0.4.3 stringi_1.1.3
[67] yaml_2.1.14 MASS_7.3-45
[69] SummarizedExperiment_1.5.7 zlibbioc_1.21.0
[71] plyr_1.8.4 AnnotationHub_2.7.14
[73] grid_3.5.0 lattice_0.20-35
[75] Biostrings_2.43.6 splines_3.5.0
[77] multtest_2.31.0 GenomicFeatures_1.27.10
[79] knitr_1.15.1 MotIV_1.31.0
[81] seqinr_3.3-3 biomaRt_2.31.4
[83] futile.options_1.0.0 XML_3.98-1.5
[85] biovizBase_1.23.2 latticeExtra_0.6-28
[87] data.table_1.10.4 lambda.r_1.1.9
[89] idr_1.2 httpuv_1.3.3
[91] gtable_0.2.0 assertthat_0.1
[93] ggplot2_2.2.1 mime_0.5
[95] xtable_1.8-2 survival_2.41-2
[97] ChIPpeakAnno_3.9.16 tibble_1.2
[99] GenomicAlignments_1.11.12 AnnotationDbi_1.37.4
[101] memoise_1.0.0 cluster_2.0.6
[103] interactiveDisplayBase_1.13.0 VariantFiltering_1.11.4

lawremi commented 7 years ago

The splitBam() function needs a lot more explanation. Why does it need to know the conservation scores, transcripts, etc? Looks like it's using random forest on the fragment sizes and conservation... seems a lot more complicated than how it is presented. You should probably split it up into pieces: the reading, the classification/splitting, and the writing, with all 3 wrapped up in a convenience function. You're already close to that after exporting the splitBamByCut() function, although I would suggest renaming that, since it does not operate on a BAM file.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

06fa31f rewrite splitBam function by just call shiftBam, s...

jianhong commented 7 years ago

Dear @lawremi ,

I split the splitBam function into shiftBam, splitGAlignmentsByCut and writeGAlignmentsList. And add some details about this function.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170329142803.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

1192b7e try to avoid error in windows that splitBam ask to...

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170329151335.html

lawremi commented 7 years ago

I would further split apart the BAM reading (maybe the user can just do this without any help) from the shifting. Also, the seqlevs argument should probably be dropped. Just let the user restrict during BAM reading. More docs are still needed.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

ad954d0 change shiftBam to shiftGAlignmentsList

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170331190108.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

14de5df add bindingSites argument for factorFootprints

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170401171529.html

hpages commented 7 years ago

Hi Jianhong Ou,

Thanks for submitting ATACqc. I had a 1st look at the package and have some feedback. See below for the details. Some effort is needed to make the package use less resources (item 1) and to reduce its dependencies (item 2). I will take a 2nd look at the package once these issues are addressed.

Regards, H.

  1. Building the vignette used about 5 Gb of RAM and took about 11 min on the Linux server where I did this (an Ubuntu 14.04 server with 56 cores and 384 Gb). This is a little bit too much resources and it might cause problems in the context of our daily builds where we build and check in parallel 1300+ packages every day. Please make adjustments to the vignette so it can be built using a more reasonable amount of resources. In particular, please try to reduce the memory used. Ideally it should not use more than 2 Gb of RAM (it will fail on 32-bit Windows if it uses more than that).

  2. Loading the package with library(ATACqc) takes more than 13 sec on my laptop! This is because of the package very high number (> 100) of direct and indirect dependencies as reported by sessionInfo():

    > length(sessionInfo()$loadedOnly)
    [1] 102

    The package explicitely imports 18 packages. It seems to me that some of them (e.g. GenomicFeatures) don't need to be imported because you don't use them anywhere internally. For example you have the

     @import GenomicFeatures

    line before 4 functions (enrichedFragments, shiftGAlignmentsList, splitBam, and splitGAlignmentsByCut). However, AFAICT, none of these function makes direct calls to functions defined in GenomicFeatures. This means that GenomicFeatures doesn't need to be in Imports, only in Suggests (because you only use it in the examples). So you might want to revisit your imports and make sure that you only import packages that you actually use directly internally. FWIW here are the packages that you import with the most dependencies: trackViewer, VariantFiltering, ChIPpeakAnno, motifStack, and GenomicFeatures. Don't import these packages lightly. If you use them only in one or two of your functions that don't provide a core functionality of your package, please consider moving them to your Suggests field. For example you could move motifStack to the Suggests field and then replace

    plotMotifLogoA(motif)

    with

    motifStack::plotMotifLogoA(motif)

    inside the plotFootprints() function.

  3. I got an error when I tried to build the vignette on a system where Pandoc (>= 1.12.3) and/or pandoc-citeproc not available. This error can be worked around by replacing:

    date: "`r doc_date()`"
    package: "`r pkg_ver('ATACqc')`"

    with:

    date: "`r BiocStyle::doc_date()`"
    package: "`r BiocStyle::pkg_ver('ATACqc')`"
  4. Your code contains many tests like this:

     class(positive)=="integer"
     class(txs)=="GRanges"
     class(bamIn)!="GRangesList"
     class(genome)=="BSgenome"
     class(.ele)!="GAlignments"
     etc...

    Testing for class equality is not a good idea and is considered bad practice. That's because the test will fail for an object that belongs to a class that extends the expected class. For example, if txs belongs to a class that extends GRanges, then test class(txs)=="GRanges" will fail. But there is really no reason to not accept such an object. An object that extends GRanges is a GRanges object and anything you can do with an object of class GRanges (a.k.a. a GRanges instance) you can still do with an object that extends GRanges. This is why the recommended practice is to test with is(). For example:

    is(txs, "GRanges")
    !is(bamIn, "GRangesList")
    is(genome, "BSgenome")
    !is(.ele, "GAlignments")

    For integer objects, the recommended practice is to test with is.integer():

    is.integer(positive)
bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/ATACqc_buildreport_20170405074058.html

jianhong commented 7 years ago

Hi,

Will it too late to change the package name from ATACqc to ATACseqQC? If not, what should I do to make the change?

And Thank you @hpages . I am working on the package according your comments.

hpages commented 7 years ago

It's not too late. If you decide to do so, start a new submission and link it to the new package (i.e. to https://github.com/jianhong/ATACseqQC). Please do this ASAP. We don't have much time left before the release. Thanks! H.

jianhong commented 7 years ago

Hi @hpages @lawremi I re-submitted the package to ATACseqQC. And I did some changes:

  1. tried to decrease the RAM requirement. On my mac, the RAM is lower than 4G.
  2. I removed several packages from import list. And move motifStack as suggest package.
  3. I change the date and package in vignette according @hpages codes.
  4. change all checks for class to is
hpages commented 7 years ago

closing the issue (package has been resubmitted as ATACseqQC)