fhcrc / seqmagick

An imagemagick-like frontend to Biopython SeqIO
http://seqmagick.readthedocs.org
GNU General Public License v3.0
113 stars 22 forks source link

add --relative-to option to extend --cut #22

Closed nhoffman closed 13 years ago

nhoffman commented 13 years ago

It would be useful to be able to specify the name of a sequence within an alignment to treat as a numbering standard relative to which --cut is applied. For example, to isolate the V3 from an alignment of hiv genomes (assuming the alignment contains a sequence named "HXB2"):

seqmagick convert --cut 7110:7216 --relative-to HXB2 hiv_genomes.fasta env_v3.fasta

matsen commented 13 years ago

So will the numbering be the ungapped one?

cmccoy commented 13 years ago

It's implemented as numbered on the ungapped sequence specified - all columns containing a gap in the ref seq are removed, then a normal --cut is applied.

cmccoy commented 13 years ago

But now that I think about it, that's wrong - all insertions will be removed.

nhoffman commented 13 years ago

Right, what I had in mind was to just identify the columns in the alignment occupied by the positions specified in the reference sequence, translate to coordinates in the alignment, and cut there. No degapping required.

Thanks, Noah

On Sep 28, 2011, at 2:43 PM, Connor McCoyreply@reply.github.com wrote:

But now that I think about it, that's wrong - all insertions will be removed.

Reply to this email directly or view it on GitHub: https://github.com/fhcrc/seqmagick/issues/22#issuecomment-2230764

cmccoy commented 13 years ago

Great - that's happening now.