evolbioinfo / goalign

Goalign is a set of command line tools and an API to manipulate multiple sequence alignments. It is implemented in Go language.
GNU General Public License v2.0
73 stars 8 forks source link

Remove list of columns and column ranges #7

Closed tseemann closed 3 years ago

tseemann commented 4 years ago

If i have a text file of positions i want to remove from an alignment. Like subseq but i can choose all positions (or inverse)

% cat seq.afa
>x
-ATGC
>y
TATG-
>z
ATCGG

% cat pos.txt
1
3
5

% goalign subseq --sitefile post.txt < seq.afa
>x
AG
>y
AG
>z
TG

Also would like option for the pos.txt to be relative to a specified "reference" taxon which ignores it's gaps.

fredericlemoine commented 4 years ago

Dear Torsten,

Thanks for your suggestion.

1) If you want a sub-sequence relative to a reference sequence you can do it with goalign subseq --ref-seq <sequence name>. However, it will not take all coordinates from a file.

2) If you want to select several columns from an alignment, a workaround would be to use the goalign split --partition command, which takes all columns defined in the partition file and output them in a separate file. That way you can define several partitions. However this one does not take reference coordinates without gaps yet... Example of a partition file:

M1,p1=1-12/3,2-12/3
M2,p2=3-12/3

It will output 2 files:

3) Finally, I implemented the command goalign subsites, which extracts given sites from an input alignment. Options --ref-seq, --reverse and --sitefileare available.

Do not hesitate to tell me if it is what you were looking for.

tseemann commented 4 years ago

@fredericlemoine thanks for the quick response!

  1. I tried this but then realised it only supported 1 range (same as esl-alimask)
  2. i saw that, but could not find documentation on the partition file (it's cool, but clunky for this)
  3. that is awesome! it's in the dev branch i see. i really need tagged + binary releases to deploy. if you make a new "pre-release" 0.3.3c i can properly test and use all the recent features you've impemented in dev.
fredericlemoine commented 4 years ago

I just "pre"-released v0.3.3c

tseemann commented 4 years ago

Thanks! Just installed now and testing.

fredericlemoine commented 4 years ago

Hi @tseemann , Can I close this issue? Also, If you want to have a look, I implemented the extract command, to extract several sub-alignments from an input alignment.