More cleaning/filtration steps

I'm in the middle of making an IRMA module for Adenoviruses. I came across your repo today and thought it would be useful for that purpose (I'm definitely thinking of using it to generate consensus sequences.) The IRMA paper mentions a few filtration steps that I thought would be a natural fit (in the "Methods" section, in the "Datasets" sub-section, in the "Influenza alignment dataset" sub-sub-section, second paragraph). In particular, they mentioned:

Removing duplicate sequences
This should be the (second-)easiest of the bunch.
Removing sequences with greater than N ambiguous nucleotides
In the paper, the authors specified N=5, which may be a good default setting for Influenza A/B segments.
Removing sequences causing frame-shifts I think this may be relatively difficult to calculate, compared to the others.
Removing short sequences
This functionality is already implemented (--remove_short), but it may be nice to have the ability to specify a percentage of the alignment as a cutoff.

KatyBrown / CIAlign

More cleaning/filtration steps #43