Allow filter/replace on IDs without a description.

fhcrc / seqmagick

An imagemagick-like frontend to Biopython SeqIO

http://seqmagick.readthedocs.org

GNU General Public License v3.0

113 stars 22 forks source link

Allow filter/replace on IDs without a description. #48

Closed bcclaywell closed 9 years ago

bcclaywell commented 9 years ago

See #47; partially reverts 515f55eb7ed2607b3c216ca5c8ced4bae8df51a5.

The sticky part is replacement. This commit's approach:

Substitution is always performed on the sequence ID. If the first "word" of the description matches the ID, assume that the description is FASTA-like and perform substitution only on the "rest" of the description, then add the (possibly modified) ID on front. If the first word does not match the ID, perform substitution on the entire description.

cmccoy commented 9 years ago

My 2¢, feel free to ignore.

I would expect to be able to replace an entire FASTA header with a regex, so for the hypothetical header:

>sequence1 description

applying --pattern-replace "sequence1 description" "sequence2 description2", would yield the new header:

>sequence2 description2

Which would change the logic. If the description starts with the ID (FASTA-like), perform the replace on the description and update the id to description.split(None, 1)[0]. If not, perform the replace on both.

bcclaywell commented 9 years ago

No, that's a good idea -- and would probably be more consistent with current behavior and existing scripts. I'll tinker with that a bit. Thanks, Connor!

matsen commented 9 years ago

@cmccoy thanks a bunch for your input. If this seems satisfactory to you (or you don't have time to look) it looks good to merge!

cmccoy commented 9 years ago

Looks great to me. On Apr 6, 2015 3:10 PM, "Erick Matsen" notifications@github.com wrote:

@cmccoy https://github.com/cmccoy thanks a bunch for your input. If this seems satisfactory to you (or you don't have time to look) it looks good to merge!

— Reply to this email directly or view it on GitHub https://github.com/fhcrc/seqmagick/pull/48#issuecomment-90260550.

bcclaywell commented 9 years ago

Thanks guys!