Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

periodicDNA #1511

Closed js2264 closed 4 years ago

js2264 commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/periodicDNA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 4 years ago

I have now managed to make the examples/tests run faster, so the check time is under 10 minutes total. Let me know what the next steps should be. Thanks!

js2264 commented 4 years ago

Hi @hpages, Any info on when the review will be done? Best, J

js2264 commented 4 years ago

Dear @hpages, Sorry to bother you again. Any info on when the full review would be available? Or are your last comments the only ones you had? Best, J

js2264 commented 4 years ago

Dear @hpages, Sorry to insist, but is there anything else to do for the review? I would like to submit a manuscript presenting the package and I need it to be available. It has been more than 10 weeks since the beginning of the submission process. Perhaps the review should be transferred to someone with more time available? Best, J

hpages commented 4 years ago

Hi @js2264 , Thanks for the changes to periodicDNA. I'm taking a deeper look at the package now and will get back to you in the next 24h or so. Thanks for your patience. H.

hpages commented 4 years ago

Hi @js2264 ,

An important effort is needed to improve the documentation. See below.

Best, H.

  1. Are utility functions like checkPeriodFromFourier or withSeq useful/meaningful to the end user? I can't figure out what users of the periodicDNA package would need these functions for. Furthermore, in the case of checkPeriodFromFourier I don't really understand what the function does and have no idea how I would use it in the context of a periodicDNA analysis. If a function is an internal utility (i.e. not intended to be used by the end user), please do not export or document it (see my earlier comment about that). Otherwise please improve its man page, including the examples provided in its man page.

  2. Assuming sampleGRanges is a function intended for the end user (see above point if it's not), then its man page needs improvements. The current description of what the function does exactly is minimalist and seems inaccurate. In particular it says that the returned GRanges has "the same chromosome, width and strand distributions than the original" but that doesn't seem to be the case:

    gr <- GRanges(LETTERS[1:5], IRanges(1:5, width=10), strand=c("+", "+", "-", "-", "*"))
    set.seed(333)
    sampleGRanges(gr)
    # GRanges object with 5 ranges and 0 metadata columns:
    #       seqnames    ranges strand
    #          <Rle> <IRanges>  <Rle>
    #   [1]        B      1-10      +
    #   [2]        C      7-10      -
    #   [3]        D       6-8      -
    #   [4]        D        11      -
    #   [5]        E       4-9      *
    #   -------
    #   seqinfo: 5 sequences from an unspecified genome

    Also why set these artificial seqlengths on the returned GRanges?

    seqlengths(gr2)
    #  A  B  C  D  E 
    # 10 11 12 13 14 

    That would not make much sense in general and would likely cause problems downstream (e.g. a findOverlaps() operation would fail because of incompatible seqlengths between query and subject).

  3. The vignette starts with "Dinucleotide periodicity over a set of Genomic Ranges". Unfortunately this somehow suggests that periodicDNA only handles dinucleotides which I believe is not the case. How about giving a general introduction to the periodicDNA packages first, and then delve into the particular case of dinucleotide periodicity?

  4. Why explaining how periodicDNA works in a separate vignette? This information is very relevant to anybody seriously interested in using the package so a much better place for it is in the main vignette. This is also an opportunity to introduce terms like distogram, to define them precisely, and to use them later thru the rest of the documentation.

  5. I could be misunderstanding how periodicDNA works but it seems to me that the message displayed by getPeriodicity() -- "Applying Fast Fourier Transform to the vector of distances" -- is incorrect. My understanding is that it applies the FFT to the normalised distogram, which is not the same as the vector of distances.

  6. It's very hard to find the documentation for the extra arguments accepted by getPeriodicity(). The man page for the function (typically accessed with ?getPeriodicity) doesn't say anything about these arguments but uses them (e.g. range_spectrum) in its examples. After a while I realized that these arguments are documented in separate man pages (in ?getPeriodicity.DNAStringSet and in ?getPeriodicity.GRanges). However most users will struggle to find these man pages, at best, or won't be able to find them. This problem is usually avoided by documenting the S3 generic and all its methods in the same man page. Another benefit of doing this is that it avoids repeating the same information across all the man pages (e.g. the long Value section) which will make long term maintenance significantly easier.

  7. The other S3 generics (getFPI, plotAggregateCoverage, etc..) have the same documentation problem.

  8. Why isn't motif an argument of the getPeriodicity() generic itself since all the methods actually need a motif in order to work? It is generally considered better design to put arguments that are common to all methods in the generic.

  9. The motif argument in ?getPeriodicity.DNAStringSet is described as "a dinucleotide of interest" and the 6th item in the list returned by the function is described as "The motif object is the dinucleotide being analysed". Unfortunately this again suggests that the function can only handle dinucleotides.

  10. getPeriodicity returns a list of 7 components but the man page for getPeriodicity only documents 6.

  11. Generally speaking functions are expected to catch edge cases (or misuse on the part of the user) in a graceful manner, and to display useful error messages. For example it's not immediately clear what's going on here:

    motif <- "AAA"
    seqs <- rep(DNAStringSet("AATCTAATCTAATCTAATCTAATCTAATCTAA"), 20)
    periodicity_result <- getPeriodicity(seqs, motif)
    # - Mapping k-mers.
    # range_spectrum (1:200) is wider than the maximum distance between pairs of k-mers (-Inf). Try shortening range_spectrum to a smaller range.
    # Error in getPeriodicity.DNAStringSet(seqs, motif) : 
    # In addition: Warning messages:
    # 1: In max(dists) : no non-missing arguments to max; returning -Inf
    # 2: In max(dists) : no non-missing arguments to max; returning -Inf
    # 3: In max(dists) : no non-missing arguments to max; returning -Inf

    So please implement some basic checking of the user input and make sure to print informative error messages.

  12. Keep the code chunks in the vignette as simple and easy to understand as possible, preferably sticking to traditional R syntax. For example, this is mostly a matter of taste but I don't find something like

    periodicity_result$periodicityMetrics %>% '['(.$Period == 10, )

    to be particularly easy to read/understand. The more traditional R syntax is IMO much clearer:

    subset(periodicity_result$periodicityMetrics, Period == 10)
js2264 commented 4 years ago

Hi @hpages, Thanks for your review. I will start working on it asap. Best, J

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

a390773 bump version 0.99.18

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/periodicDNA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

js2264 commented 4 years ago

Hi @hpages,

I have tried to address all your comments. Briefly:

  1. I have deleted some functions and removed others from the NAMESPACE, per your instructions.
  2. I removed sampleGRanges function entirely.
  3. I completed the vignette and replaced "dinucleotide" by "k-mer" when necessary.
  4. I have fused the 2 vignettes together.
  5. I have corrected the statement.
  6. I have simplified the documentation per your suggestions.
  7. I have simplified the documentation per your suggestions.
  8. I have moved motif to the generic function directly. Thanks for the tip!
  9. Replaced "dinucleotide" by "k-mer" when necessary.
  10. I have completed and clarified getPeriodicity() manual.
  11. I have added clear errors messages wherever I could think of a edge case / potential misuse.
  12. Hopefully the vignette is now clear enough.

Let me know if there are further changes you'd want me to work on, Thanks!

hpages commented 4 years ago

Hi @js2264 ,

All the changes look good. Package is ready to go. Thanks for your contribution to the project.

Cheers, H.

bioc-issue-bot commented 4 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

mtmorgan commented 4 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/js2264.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("periodicDNA"). The package 'landing page' will be created at

https://bioconductor.org/packages/periodicDNA

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.