Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

COCOA #921

Closed j-lawson closed 6 years ago

j-lawson commented 6 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 6 years ago

Hi @j-lawson

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: COCOA
Version: 0.99.3
Date: 2018-10-19
Title: Coordinate Covariation Analysis
Description: COCOA is a method for understanding variation among samples
    and can be used with data that includes genomic coordinates
    such as DNA methylation. On a high level, COCOA uses a database
    of "region sets" and principal component analysis (PCA) of your data
    to identify sources of variation among samples. A region set is a
    set of genomic regions that share a biological annotation, for instance,
    transcription factor binding regions, histone modification regions,
    or open chromatin regions. COCOA works in both supervised
    (known groups of samples) and unsupervised (no groups) situations
    and can be used as a complement to "differential" methods that find
    discrete differences between groups. COCOA can identify biologically
    meaningful sources of variation between samples and
    increase understanding of variation in your data.
Authors@R: c(person("John", "Lawson", email="jtl2hk@virginia.edu", role=c("aut", "cre")), person("Nathan", "Sheffield", role=c("aut"), comment="http://www.databio.org"))
Depends: R (>= 3.5), GenomicRanges
Imports:
    BiocGenerics,
    S4Vectors,  
    IRanges,
    data.table,
    ggplot2,
    Biobase,
    stats,
    methods,
    ComplexHeatmap,
    MIRA,
    tidyr,
    grid,
    grDevices
Suggests:
    knitr,
    parallel,
    testthat,
    BiocStyle,
    rmarkdown,
    AnnotationHub,
    LOLA
VignetteBuilder: knitr
License: GPL-3
biocViews: PrincipalComponent, GenomicVariation, DNAMethylation, GeneRegulation, GenomeAnnotation, 
       SystemsBiology, FunctionalGenomics, ChIPSeq, 
 MethylSeq, Sequencing, Epigenetics
URL: http://code.databio.org/COCOA/
BugReports: https://github.com/databio/COCOA
RoxygenNote: 6.1.0

Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.

bioc-issue-bot commented 6 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

j-lawson commented 6 years ago

Hi @Liubuntu,

This is the same package that was previously under the issue "PCRSA". We changed the name to COCOA but an update to the version number no longer triggered a build after that so we have created a new issue. The reviewer was Hervé (@hpages ). I will transfer his comments to this issue. The original issue was: https://github.com/Bioconductor/Contributions/issues/865

Best, John Lawson

j-lawson commented 6 years ago

From Hervé (the formatting is messed up because only the text, not the markdown could be copied):

Hi John,

Thanks for this new contribution. It looks good. There are only a few minor things that need your attention:

The titles of your man page repeat exactly the content of the Description section so are way too long. See for example ?featuresAlongPC. (I think MIRA had the same issue when I first looked at it.)

Please move GenomicRanges from Imports to Depends. This will make the basic GRanges functionality immediately available to your users. Right now, because GenomicRanges is not attached to the search path, basic functionality like the start(), end() and width() getters are not available, so your serialized GRanges objects will be perceived as broken:

library(PCRSA) data(esr1_chr1) esr1_chr1

GRanges object with 765 ranges and 0 metadata columns:

seqnames ranges strand

1 chr1 1072710-1074806 *

2 chr1 1079163-1080853 *

3 chr1 1236995-1238060 *

4 chr1 1310031-1311345 *

5 chr1 1432920-1434006 *

... ... ... ...

761 chr1 245046415-245047368 *

762 chr1 245335995-245336978 *

763 chr1 246583472-246584185 *

764 chr1 246608249-246609543 *

765 chr1 246632212-246633395 *

-------

seqinfo: 72 sequences from an unspecified genome; no seqlengths

start(esr1_chr1)

[1] 1 1

Once GenomicRanges is in Depends, you don't need to prefix anything with GenomicRanges:: or to explicitely call library(GenomicRanges) in your man pages or vignettes.

Make sure that the various functions that have the mCoord argument work when a GRanges object is supplied (as advertised in the man pages):

library(PCRSA) library(GenomicRanges) data(brcaCoord1) data(brcaLoadings1) data(esr1_chr1) data(nrf1_chr1) GRList <- GRangesList(esr1_chr1, nrf1_chr1) mCoord <- GRanges(brcaCoord1$chr, IRanges(brcaCoord1$start, width=1L)) pcRegionSetEnrichment(loadingMat=brcaLoadings1, mCoord=mCoord, GRList=GRList, PCsToAnnotate=c("PC1", "PC2"), scoringMetric="rsMean")

Error in setnames(loadingDT, c("chr", "start", PCsToAnnotate)) :

Can't assign 4 names to a 5 column data.table

Please specify the reference genome (e.g. hg19, hg38, mm10, etc...) of the various *_chr1 and brcaCoord1 datasets. Ideally the user should be able to get this information (as well as the sequence lengths and circularity flags) by calling seqinfo() on the GRanges objects.

Also it might be obvious for most users but it wouldn't hurt reminding them that all the GRanges objects that are passed to the GRList argument of pcEnrichmentProfile() and pcEnrichmentProfile() should be on the same reference genome.

Thanks, H.

j-lawson commented 6 years ago

Here is my response:

Hi Hervé,

Thanks for the feedback! I have addressed all of your points as follows:

  1. I shortened the function titles.

  2. I moved GenomicRanges to "Depends".

  3. I took out explicit calls to GenomicRanges:: and library(GenomicRanges)

  4. I fixed the bug that prevented GRanges from being used as input and added more unit tests with GRanges input.

  5. I added reference genome, sequence length and circularity flag info to the built in GRanges objects. I also mentioned the reference genome in the data documentation.

  6. I mentioned that the reference genome for GRList should be consistent with the samples in the function parameter documentation. I also mentioned it in the main vignette.

In addition, I revised the main vignette/documentation and renamed the package since we think our package can have a wider scope and be applied to more data types than we initially thought. I updated the version in the Description to trigger a build but as of yet, nothing has happened. I'm not sure if there is some lag time with the system or a problem due to the name change.

Best, John

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

lshep commented 6 years ago

This issue should remain under the original issue and not start a new issue for the rename. Please move the comments from this issue onto the original issue number.