Closed bcclaywell closed 9 years ago
My 2¢, feel free to ignore.
I would expect to be able to replace an entire FASTA header with a regex, so for the hypothetical header:
>sequence1 description
applying --pattern-replace "sequence1 description" "sequence2 description2"
, would yield the new header:
>sequence2 description2
Which would change the logic. If the description starts with the ID (FASTA-like), perform the replace on the description and update the id to description.split(None, 1)[0]
. If not, perform the replace on both.
No, that's a good idea -- and would probably be more consistent with current behavior and existing scripts. I'll tinker with that a bit. Thanks, Connor!
@cmccoy thanks a bunch for your input. If this seems satisfactory to you (or you don't have time to look) it looks good to merge!
Looks great to me. On Apr 6, 2015 3:10 PM, "Erick Matsen" notifications@github.com wrote:
@cmccoy https://github.com/cmccoy thanks a bunch for your input. If this seems satisfactory to you (or you don't have time to look) it looks good to merge!
— Reply to this email directly or view it on GitHub https://github.com/fhcrc/seqmagick/pull/48#issuecomment-90260550.
Thanks guys!
See #47; partially reverts 515f55eb7ed2607b3c216ca5c8ced4bae8df51a5.
The sticky part is replacement. This commit's approach:
Substitution is always performed on the sequence ID. If the first "word" of the description matches the ID, assume that the description is FASTA-like and perform substitution only on the "rest" of the description, then add the (possibly modified) ID on front. If the first word does not match the ID, perform substitution on the entire description.