KatyBrown / CIAlign

MIT License
117 stars 9 forks source link

Retain sequences #45

Closed KatyBrown closed 2 years ago

KatyBrown commented 2 years ago

I added new parameters so that users can choose sequence names not to process with remove_divergent, crop_ends and remove_short (i.e. the functions that run on rows rather than columns) e.g. so that an outgroup can be included and not removed by remove_divergent, or sequences which are known to be correct can be left unedited by CIAlign.

There are 12 parameters - three functions each to run separately on remove_divergent, crop_ends and remove_short, plus three to run on all row-wise cleaning steps. In each set of three there is the option to specify sequence names directly (--retain, --crop_ends_retain, --remove_divergent_retain, --remove_short_retain), to give the path to a file with a list of sequences (--retain_list, --crop_ends_retain_list etc.) or to give a string to match in the sequence names to be ignored (--retain_str, --crop_ends_retain_str etc.).

I have added a sequence to example1 so that it's possible to test these parameters.

I've also updated the manual, tests etc. accordingly.

I made a few very minor changes to formatting of the argP.py file (just tidying up) and to ini_template.txt.