BioJulia / BioAlignments.jl

Sequence alignment tools
MIT License
60 stars 24 forks source link

Add alignment string position support #44

Closed alyst closed 3 years ago

alyst commented 3 years ago

Alignment string position support

Types of changes

This PR implements the following changes:

:clipboard: Additional detail

:ballot_box_with_check: Checklist

alyst commented 3 years ago

The failures due to Registry.toml problems, the testing even didn't start. Also: is it possible to update CI to test on latest stable Julia and also on nightly (with allow_failures). I guess only LTS (is it 1.0 or 1.1?) Julia should be kept from the previous versions.

alyst commented 3 years ago

Some other notes on the PR:

alyst commented 3 years ago

I don't know who would be the best to review, but since you were committing recently, maybe you can take a look, @BenJWard ? Or maybe you can suggest a more suitable reviewer? Thanks in advance!

alyst commented 3 years ago

@BenJWard @jakobnissen If you would have time, could you please review this PR?

alyst commented 3 years ago

I'm not 100% sure what this functionality is useful for, though. Does the concept of an "alignment position" make sense?

For multiple sequence alignments it definitely does, because it's actually the column of MSA.

But it also does make sense for pairwise one. If you want to display the pairwise alignment, you may need to know what is the position of a specific letter from either sequence in the aligned sequence. My use case was to correctly display the post-translational modifications (phosphorylation, ubiquitination etc) of inidividual amino acids on the pairwise-aligned homologous protein sequences. There could be regions that are present only on the first sequence or regions that are present only on the second one. So you definitely need an aligned position, which preserves everything.

alyst commented 3 years ago

@jakobnissen It looks like at the moment CI seems broken. Should this package be updated to use GitHab actions for CI? I can try submitting the PR for that, but maybe it would be better if someone who has better knowledge of BioJulia ecosystem is doing that.

jakobnissen commented 3 years ago

Yeah... BioJulia is strapped for dev manpower at the moment, so there is a lot of cleanup and housekeeping that needs to be done in all sorts of packages, and just haven't been done. If you make a PR, I'll merge it! You can look at e.g. BioSequences for how to do this.

codecov[bot] commented 3 years ago

Codecov Report

Merging #44 (c8e434c) into master (3e3910c) will decrease coverage by 0.41%. The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #44      +/-   ##
==========================================
- Coverage   88.82%   88.41%   -0.42%     
==========================================
  Files          17       17              
  Lines        1056     1079      +23     
==========================================
+ Hits          938      954      +16     
- Misses        118      125       +7     
Impacted Files Coverage Δ
src/BioAlignments.jl 100.00% <ø> (ø)
src/alignment.jl 79.16% <76.92%> (-4.62%) :arrow_down:
src/alignedseq.jl 96.92% <100.00%> (+0.31%) :arrow_up:
src/anchors.jl 83.33% <100.00%> (+8.33%) :arrow_up:
src/operations.jl 90.32% <100.00%> (ø)
src/pairwise/algorithms/common.jl 91.17% <100.00%> (+4.21%) :arrow_up:
src/pairwise/algorithms/hamming_distance.jl 100.00% <100.00%> (ø)
src/pairwise/alignment.jl 99.06% <100.00%> (-0.05%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3e3910c...cc8237c. Read the comment docs.

jakobnissen commented 3 years ago

@alyst Ping me when it's ready to be merged :)

alyst commented 3 years ago

@jakobnissen Thank you! I think the PR is good to go. I have addressed your suggestions, also I have added tests to the new aln2xxx/xxx2aln() methods that were not covered. There are still new lines in the PR that are not covered, but they were derived from the existing exceptions that got more specific after alnpos was added. They were not covered by the tests before. So I think it's fine to merge this PR, and address the coverage later by introducing more test cases.

jakobnissen commented 3 years ago

Thanks for putting in the work!

timholy commented 2 years ago

Docs need updating: https://github.com/BioJulia/BioAlignments.jl/blob/master/docs/src/alignments.md#representing-alignments