Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

REMP #121

Closed YinanZheng closed 7 years ago

YinanZheng commented 8 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 8 years ago

Hi @YinanZheng

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: REMP
Type: Package
Title: Repetitive Element Methylation Prediction
Version: 0.99.0
Authors@R: c(person("Yinan","Zheng", role = c("aut","cre"), email = "y-zheng@northwestern.edu"),person("Lifang","Hou", role = c("cph"), email = "l-hou@northwestern.edu"))
Author: Yinan Zheng <y-zheng@northwestern.edu> [aut, cre], Lifang Hou <l-hou@northwestern.edu> [cph]
Maintainer: Yinan Zheng <y-zheng@northwestern.edu>
Description: Predicting locus-specific repetitive element methylation based on Illumina 450K/EPIC Methylation BeadChip Array.
License: file LICENSE
Depends:
    R (>= 3.3.0),
    data.table,
    doParallel,
    caret
Imports:
    AnnotationHub,
    GenomicRanges,
    IRanges,
    minfi,
    BSgenome,
    BSgenome.Hsapiens.UCSC.hg19,
    IlluminaHumanMethylation450kanno.ilmn12.hg19,
    IlluminaHumanMethylationEPICanno.ilm10b2.hg19,
    impute,
    stringi,
    randomForest,
    kernlab,
Suggests:
    REMPdata,
    knitr,
    rmarkdown
VignetteBuilder: knitr
URL: https://github.com/YinanZheng/REMP
BugReports: https://github.com/YinanZheng/REMP/issues
LazyData: true
SystemRequirements: GNU make
biocViews: DNAMethylation, Microarray, MethylationArray, GenePrediction
RoxygenNote: 5.0.1
bioc-issue-bot commented 8 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20160920055121.html

lawremi commented 8 years ago
  1. Should use BiocParallel instead of foreach. The usage of foreach is bizarre, where strings of R code are parsed and evaluated. Much better to use a functional API.
  2. Uses stringi instead of Biostrings for searching through sequences.
  3. Any use of data.table needs strong justification.
  4. Use message() instead of cat().
  5. Gets refseq gene database directly from UCSC. Please consider using AH, rtracklayer, or a TxDb package. You really don't want to get data that is uploaded weekly. For reproducibility and consistency, having versioned data resources is critical.
  6. This package appears to be specific to hg19; is that all that the 450k platform supports?
bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20160920121700.html

YinanZheng commented 8 years ago

Hi lawremi, Thanks for the review. In the next version:

  1. I will use BiocParallel to rewrite the parallel computing part.
  2. Did you mean to use Biostrings instead?
  3. I will use DataFrame instead.
  4. I will use message() instead.
  5. I will use AnnotationHub to fetch the database
  6. Yes, both Illumina 450k and EPIC platforms support hg19. The available Illumina 450k and EPIC platform manifest and annotation pacakges in Bioconductor are also in hg19.
bioc-issue-bot commented 8 years ago

Hi @YinanZheng,

Starting build on additional package https://github.com/YinanZheng/REMPdata.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your additional package repository will NOT trigger a new build.

The DESCRIPTION file of this additional package is:

Package: REMPdata
Version: 0.99.0
Title: Example data for REMP package
Authors@R: c(person("Yinan","Zheng", role = c("aut","cre"), email = "y-zheng@northwestern.edu"), person("Lifang","Hou", role = c("cph"), email = "l-hou@northwestern.edu"))
Author: Yinan Zheng <y-zheng@northwestern.edu> [aut, cre], Lifang Hou <l-hou@northwestern.edu> [cph]
Maintainer: Yinan Zheng <y-zheng@northwestern.edu>
Description: This package contains HapMap LCL GM12878 methylation data profiled by Illumina 450K array and EPIC Methylation BeadChip Array.
Depends: R (>= 3.3.0), S4Vectors
License: file LICENSE
LazyData: true
biocViews: Homo_sapiens_Data, MethylationArrayData, HapMap, ENCODE
NeedsCompilation: no
RoxygenNote: 5.0.1
bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMPdata_buildreport_20160920164218.html

mtmorgan commented 8 years ago

Thank you for your contribution.

DESCRIPTION

vignette

R

man

mtmorgan commented 8 years ago

Do you plan to revise your package in time for the current release? The deadline is today.

YinanZheng commented 8 years ago

I am still working on the revision. Thanks for the notice though.

I have revised the parallel part using BiocParallel package. However, I came across a minor issue: when I use bplapply, the package loading message always show up multiple times (times equal to the number of the backend workers). I understand that each worker needs these packages as the apply function needs them:

Loading required package: Biostrings Loading required package: BiocGenerics Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

IQR, mad, xtabs

The following objects are masked from 'package:base':

anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit

But these loading messages are unnecessary and flush away my function's diagnostic messages. Is there a way to turn off these messages?

mtmorgan commented 8 years ago

Can you provide an example of code that I can run and that illustrates the problem?

YinanZheng commented 8 years ago

The task is to identify "CG" sequence throughout a very large DNAStringSet object. Here is a simplified example code:

library(BiocParallel)
library(Biostrings)
library(BSgenome)

# SEQ.RE is the huge DNAStringSet object.
SEQ.RE<-getSeq(BSgenome.Hsapiens.UCSC.hg19::Hsapiens, 
               names = rep("chr1",4), 
               start = c(16777161,25165801,150994893,167772065),
               end = c(16777470,25166089,150995191,167772362), 
               strand = c("+","-","-","+")) 

# Define the apply function
.RECpGPos <- function(seq, CpG)
{
  start(matchPattern(CpG, seq))
}

bp = bpparam()
bpstart(bp)
RE.CpG <- bplapply(SEQ.RE, .RECpGPos, BPPARAM = bp, CpG = DNAString("CG"))
bpstop(bp)

sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] BSgenome_1.40.1 rtracklayer_1.32.2 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 Biostrings_2.40.2 XVector_0.12.1 IRanges_2.6.1 S4Vectors_0.10.3 BiocGenerics_0.18.0 [10] BiocParallel_1.6.6

loaded via a namespace (and not attached): [1] XML_3.98-1.4 snow_0.4-1 BSgenome.Hsapiens.UCSC.hg19_1.4.0 Rsamtools_1.24.0 bitops_1.0-6
[6] GenomicAlignments_1.8.4 zlibbioc_1.18.0 tools_3.3.1 Biobase_2.32.0 RCurl_1.95-4.8
[11] SummarizedExperiment_1.2.3

The package loading message appeared 6 times in my PC.

Thanks for the help.

mtmorgan commented 8 years ago

The startup message comes when the serialized instance is sent to the workers; it's hard to avoid this other than by pre-loading the packages on the workers, e.g.,

bp = SnowParam(2)
bpstart(bp)
bplapply(seq_len(bpnworkers(bp)), function(i) {
    suppressPackageStartupMessages({
        library(Biostrings)
    })
}, BPPARAM=bp)
RE.CpG <- bplapply(SEQ.RE, .RECpGPos, BPPARAM = bp, CpG = DNAString("CG"))
bpstop(bp)

Sending large amounts of data back and forth to workers is not advised (e.g., sending the ranges for getSeq(), and getting the large sequence data on the worker rather than manager), and it's often MUCH better to use vectorized functions when they exist, rather than iteration like lapply(). Also, when a vectorized version of the function exists, it's better to use bpvec() rather than bplapply(); bpvec() splits the data into chunks that are vectorized, rather than sending individual elements. So

.vRECpGPos <- function(seq, CpG)
{
    start(vmatchPattern(CpG, seq))
}
.vRECpGPos(SEQ.RE, CpG=DNAString("CG"))

and only if the above is slow

bpvec(SEQ.RE, .vRECpGPos, CpG=DNAString("CG"))
YinanZheng commented 8 years ago

Dear Martin, Thank you so much! The vectorized function works like a charm. But I still need to use bpvec to boost the speed (.vRECpGPos took ~40s). bpvec is about 35% faster than bplapply.

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

3a80daa 0.99.1 Revisions based on the comments to the ini... f8b3ef8 0.99.1

bioc-issue-bot commented 8 years ago

We only start builds when the Version field in the DESCRIPTION file is incremented. For example, by changing

Version: 0.99.0

to

Version 0.99.1

If you did not intend to start a build, you don't need to do anything. If you did want to start a build, increment the Version: field and try again.

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161113164731.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

7fa7616 0.99.2

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161113202441.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

37d81e5 0.99.3 559f925 0.99.3

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161113210450.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

5e686d2 0.99.4

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161114010639.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

f7db1a8 0.99.5

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161115063114.html

YinanZheng commented 8 years ago

Hi,

I have revised my package as commented above . I had no build error and passed all checks in my machine but it said build error here.

I need to use some annotation or data packages like IlluminaHumanMethylation450kanno.ilmn12.hg19 in my package. So I used requireNamespace and :: in my code, as recommended. It works smoothly in my machine but I don't quite understand what the error message means and why the server cannot find the package. I have tried multiple ways but no luck. Could you please advise?

Also in my previous request I submitted a linked data package. But now it is no longer necessary thanks to AnnotationHub. It can be removed.

By the way, In my previous attempts, couple of packages used in my package was built on 3.3.2 but in the server they could be still in 3.3.1, which could lead to check warnings. Is there anything I can do to avoid this kind of warning?

Thanks!

mtmorgan commented 8 years ago

I think your approach is correct but that minfi requires a bug fix; I suggest that you move the annotation package to the Depends: field of the DESCRIPTION file until the minfi package is fixed. Please do not worry about the use of 3.3.1 / 3.3.2, the single package builder is in the process of being updated.

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

4f95c87 0.99.6

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161115224608.html

lshep commented 8 years ago

The "abnormal" is an error on our end from updating the single package builder. Once the update is complete I will manually kick off a build of your package

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161116093213.html

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMPdata_buildreport_20161116093416.html

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161116095229.html

bioc-issue-bot commented 8 years ago

Received a valid push; starting a build. Commits are:

af4726b 0.99.7

bioc-issue-bot commented 8 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161120054730.html

YinanZheng commented 8 years ago

Hi @mtmorgan ,

I don't quite understand how to fix current errors (http://bioconductor.org/spb_reports/REMP_buildreport_20161120054730.html). Could you please take a look at them? Before this submission, I have my code passed R CMD check with 0 error 0 warning and 2 notes in my PC, Mac, and Linux server.

Thanks.

YinanZheng commented 8 years ago

It seems there are some issues when AnnotationHub is trying to create a local cache in the build server ('Permission denied').

YinanZheng commented 8 years ago

Could you please help with the issues? Thanks a lot.

lshep commented 7 years ago

We tried installing some system dependency on our end. I will manually kick off a new build to see if it helps any of the ERRORS.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161128113518.html

YinanZheng commented 7 years ago

@lshep Thank you very much!

But the permission denied issue still exists in Windows:

Warning in dir.create(cache, recursive = TRUE) : cannot create dir 'C:\Users\Default\Documents\AppData', reason 'Permission denied' Warning in dir.create(dirname(cachepath), recursive = TRUE) : cannot create dir 'C:\Users\Default\Documents\AppData', reason 'Permission denied' Warning in file.create(to[okay]) : cannot create file 'C:/Users/Default/Documents/AppData/.AnnotationHub/annotationhub.sqlite3', reason 'No such file or directory'

I think "dir.create(cache, recursive = TRUE)" is from AnnotationHub. System denied to create cache to store AnnotationHub database and caused the downstream directory not found error.

For Linux and Mac:

Can I ignore the warning:

For Mac:

caught segfault address 0x7fef50b879e0, cause 'memory not mapped'

It think similar issue happened here when running AnnotationHub.

YinanZheng commented 7 years ago

Anyone can help with this issue? Thank you very much.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

f38bc0e 0.99.8

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161206142229.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

17e524a 0.99.9

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161206165058.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

5866546 0.99.10

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/REMP_buildreport_20161206194656.html

bioc-issue-bot commented 7 years ago

We only start builds when the Version field in the DESCRIPTION file is incremented. For example, by changing

Version: 0.99.0

to

Version 0.99.1

If you did not intend to start a build, you don't need to do anything. If you did want to start a build, increment the Version: field and try again.