Closed elcortegano closed 2 years ago
In the util/ directory there is a script written by David Ray (RM2BED.py) that may be useful. It can read in a .out or .align and filter on min_length or min/max divergence. Unfortunately we do not have a universal set of tools to do this just a set of adhoc scripts we have used over the years internally.
Hi,
I am using the
.align
to run thecalcDivergenceFromAlign.pl
script, and I am wondering if there is any way to filter data in the alignment file (and other output files from RepeatMasker) by % of sequence identity and length of the matched sequence.RepeatMasker was run using a custom library for a well-known repeat motif with a fixed length (1-2 kb) and excluding small repeats (
-nolow
). However, this motif itself contains short microsatellite sequences that do appear in the output files, making the.align
highly unspecific to the real queried motif.Version of RepeatMasker is 4.1.2-p1, installed from bioconda.