Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

pwalign #3361

Closed hpages closed 2 months ago

hpages commented 3 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 3 months ago

Hi @hpages

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: pwalign
Title: Perform pairwise sequence alignments
Description: The two main functions in the package are pairwiseAlignment() and
    stringDist(). The former solves (Needleman-Wunsch) global alignment,
    (Smith-Waterman) local alignment, and (ends-free) overlap alignment
    problems. The latter computes the Levenshtein edit distance or pairwise
    alignment score matrix for a set of strings.
biocViews: Alignment, SequenceMatching, Sequencing, Genetics
URL: https://bioconductor.org/packages/pwalign
BugReports: https://github.com/Bioconductor/pwalign/issues
Version: 0.99.0
License: Artistic-2.0
Encoding: UTF-8
Authors@R: c(
    person("Patrick", "Aboyoun", role="aut"),
    person("Robert", "Gentleman", role="aut"),
    person("Hervé", "Pagès", role="cre",
           email="hpages.on.github@gmail.com"))
Depends: BiocGenerics, S4Vectors, IRanges, Biostrings (>= 2.71.5)
Imports: methods, utils
LinkingTo: S4Vectors, IRanges, XVector, Biostrings
Enhances: Rmpi
Suggests: RUnit
LazyLoad: yes
Collate: 00datacache.R
    utils.R
    InDel-class.R
    AlignedXStringSet-class.R
    PairwiseAlignments-class.R
    PairwiseAlignmentsSingleSubject-class.R
    PairwiseAlignments-io.R
    align-utils.R
    pid.R
    substitution_matrices.R
    pairwiseAlignment.R
    stringDist.R
    zzz.R
hpages commented 3 months ago

pwalign contains the pairwiseAlignment-related stuff taken from Biostrings. The plan is to deprecate this stuff in Biostrings (in BioC 3.19), and to redirect the user to the stuff that is now in pwalign. Then to defunct it in Biostrings (in BioC 3.20), and to finally remove it from Biostrings (in BioC 3.21).

The motivations for this split are:

H.

lshep commented 3 months ago

I can pass this into building the reports however it will fail until the latest version of Biostrings is available.

hpages commented 3 months ago

Biostrings 2.71.5 (latest version) is already on nebbiolo1 so we should be good to go.

lshep commented 3 months ago

It has not propagated yet https://bioconductor.org/checkResults/devel/bioc-LATEST/Biostrings/

hpages commented 3 months ago

It doesn't need to. It's on the machine.

bioc-issue-bot commented 3 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): pwalign_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/pwalign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 3 months ago

The 2 WARNINGs were expected.

One is about using RMarkdown instead of Sweave for the vignette. Note that the vignette was just taken from Biostrings and put in pwalign. You can see it here. It was written a long time ago by Patrick Aboyoun, the original author of the pairwiseAlignment stuff. Since it contains a lot of mathematical formulae that would be tricky to translate to markdown, I don't intend to make the conversion, at least not for now.

The other WARNING is about "Empty or missing \value sections found in man pages.". This is a false positive that I reported here yesterday.

Let me know if you have questions.

H.

bioc-issue-bot commented 3 months ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

LiNk-NY commented 2 months ago

Hi Hervé, @hpages

Thank you for your submission. Please see the review below.

Best regards, Marcel


pwalign

DESCRIPTION

NAMESPACE

vignettes

R

weight <- as.integer(weight)

## instead of

if (!is.integer(weight))
    weight <- as.integer(weight)
setMethod("compareStrings",
          signature = c(pattern = "ANY", subject = "ANY"),
          function(pattern, subject) {
              compareStrings(as.character(pattern), as.character(subject))
          })

tests

> covr::package_coverage(type = "all")
pwalign Coverage: 76.46%
R/AlignedXStringSet-class.R: 0.00%
R/align-utils.R: 38.89%
R/PairwiseAlignmentsSingleSubject-class.R: 39.22%
R/pairwiseAlignment.R: 49.49%
R/zzz.R: 50.00%
R/00datacache.R: 66.67%
R/stringDist.R: 71.76%
R/PairwiseAlignments-class.R: 72.22%
R/PairwiseAlignments-io.R: 86.01%
R/substitution_matrices.R: 87.76%
src/align_pairwiseAlignment.c: 89.47%
src/align_utils.c: 99.48%
R/InDel-class.R: 100.00%
src/R_init_pairwiseAlignment.c: 100.00%
bioc-issue-bot commented 2 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: be1b36bf4fe64419cbcc64b9316c823fd576bb51

bioc-issue-bot commented 2 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): pwalign_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/pwalign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 2 months ago

Thanks Marcel for the feedback.

Note that LazyLoad field is now ignored.

Removed.

Consider converting the Rnw file to Rmd.

See my previous comment from 2 weeks ago above.

RTobjs does not seem to be used anywhere, consider its removal.

Removed.

if (x is not the thing we want)
     x <- turn_x_into_the_thing_we_want(x)

I prefer that idiom over an unconditional x <- turn_x_into_the_thing_we_want(x). That's because even if x is already the thing we want, sadly turn_x_into_the_thing_we_want() is not guaranteed to be a no-op. For example as.character(x) will drop the names of character vector x, and as(x, "A") might transform x even if is(x, "A") is TRUE. It might matter (e.g. when the object is big and turn_x_into_the_thing_we_want(x) triggers a copy) or not (like here).

FWIW this has hit me a few times in the past so I got into the habit of systematically using the if (x is not the thing we want) x <- turn_x_into_the_thing_we_want(x) idiom without even thinking about it.

Minor: To avoid repetition, perhaps use a default compareStrings,ANY,ANY-method etc...

I simplified the compareStrings() methods a bit. Minor disavantage of a compareStrings,ANY,ANY-method that blindly coerces anything you throw at it to character is that it might do some weird/unexpected things for some exotic stuff. And the error that will result in that case will probably not be of great help to the end user.

It looks like useMpi is disabled. Will it work again or should it be removed?

I disabled this. Rmpi has been in Enhances (as opposed to Suggests) for the last 15 years or so, and the way things are implemented in pairwiseAlignments() is that it will be used only if the user explicitly loads it before calling the function. This means that the useMpi mode has not been tested on the daily builds for the last 15 years. Furthermore, since this is an undocumented feature, I suppose that nobody has ever used it, except Patrick. Last but not least: it's not covered by the unit tests either.

I might re-enable it at some point in the not too distant future but some serious testing will be required first. Also, this predates BiocParallel so the Rmpi approach might be completely obsolete, I don't know. Will need to revisit, test, assess, and decide what to do with it.

Disclaimer: I've never used Rmpi myself (Patrick Aboyoun implemented this) .

The arguments should list all possible options e.g., in stringDist()

Usually yes. I think maybe the reason Patrick didn't do it in this case is that the list of all possible values for the method argument is a little bit long (c("levenshtein", "hamming", "quality", "substitutionMatrix")) so it could be ugly to see such a long list in the definition of the S4 generic and all its methods, especially in the \usage section of the man page. Also maybe not all the stringDist() methods might support all these options at the moment, or future methods might want to support different options. As long as the man page for stringDist() lists all the supported method's I can live with that.

Consider promoting functions from Biostrings to exported functions rather than using :::

My understanding is that this is acceptable when the upstream and client packages have the same maintainer, which is why R CMD check doesn't say anything in that case.

H.

LiNk-NY commented 2 months ago

Hi Hervé, @hpages Thanks for making those changes. The package has been accepted. Best regards, Marcel

bioc-issue-bot commented 2 months ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 2 months ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/hpages.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("pwalign"). The package 'landing page' will be created at

https://bioconductor.org/packages/pwalign

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.