bokulich-lab / RESCRIPt

REference Sequence annotation and CuRatIon Pipeline
BSD 3-Clause "New" or "Revised" License
89 stars 26 forks source link

ENH: replace-taxonomy action #116

Closed mikerobeson closed 3 years ago

mikerobeson commented 3 years ago

This DRAFT PR intends to provide a very simple way to find and replace taxonomy strings. For example, a user may like to fix the Escherichia-Shigella garbage, by providing the following information below as a metadata file:

id replacements
g__Salmonella g__Escherichia-Shigella
g__Escherichia; g__Escherichia-Shigella;
g__Shigella g__Escherichia-Shigella

Note: although the current implementation uses re, it does so with re.escape. So, only literal find and replace is carried out. Just keeping it simple for now.

Must do:

Should eventually do:

Other thoughts?