Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

add option to make asymmetric nuc substitution matrix #77

Closed acvill closed 2 years ago

acvill commented 2 years ago

Edited nucleotideSubstitutionMatrix function in pairwiseAlignment.R to include an option to generate an asymmetric nucleotide substitution matrix. This is useful when you don't want to penalize a query sequence that is aligned to a subject / consensus with ambiguous bases.

hpages commented 2 years ago

Thanks for the PR. Can you please document the new symmetric argument (in man/substitution_matrices.Rd) and also add some example in the \examples section to illustrate its typical use? Also feel free to add your name in the \author section, with a note about what you contributed to. Something like:

\author{H. Pagès and P. Aboyoun, with contribution from Albert Vill (support
        for asymmetric matrices in \code{nucleotideSubstitutionMatrix()})}

Thanks!

acvill commented 2 years ago

Thanks for the comment, I've implemented the suggested changes to man/substitution_matrices.Rd

hpages commented 2 years ago

Looks good. A few more minor things:

  1. Only compute nucleotideSubstitutionMatrix(symmetric = FALSE) once i.e. set the result in a variable (e.g. nsm) and use nsm in your two calls to pairwiseAlignment().
  2. Before using nsm, it could be somewhat educational to show some of the values in it e.g.:
    nsm["M", c("A", "C", "G", "T")]
    #  A   C   G   T 
    # 0.5 0.5 0.0 0.0 
    nsm[c("A", "C", "G", "T"), "M"]
    # A C G T 
    # 1 1 0 0 

    and to comment on that.

  3. There are too many empty lines between the examples you added and the examples below that. One empty line is enough.

Thanks again, H.

hpages commented 2 years ago

Great, thanks! I'll bump the package version so the new version will propagate to BioC 3.16 (current devel) in the next 24h-48h and become available via BiocManager::install().