Closed oushujun closed 2 years ago
Unfortunately since RepeatMasker's classification system doesn't include the name DNA/MuDR and you indicated to RepeatMasker that your are using its scheme (using the nomenclature id#type/subtype) it will alter the final annotations as you described. I agree that this should be made optional but at the moment it's intertwined in many places. I would recommend simply changing your input library id nomenclature to avoid this automatic recognition. For instance name your families like "id_type_subtype" or even simply "id_type/subtype".
Hi Robert,
Thanks for your insights. I definitely can change the input file to follow RepeatMasker's nomenclature. Is there a list of nomenclature that I can follow through?
Best, Shujun
The "Types/Subtypes" used by RepeatMasker map directly to the Dfam classification system. A table of all the classifications may be found here: https://www.dfam.org/classification downloadable as a TSV file.
Hello Robert,
I am using RepeatMasker version 4.1.1 installed in Linux via conda.
I constructed a library using known Arabidopsis TEs with their classification following RepeatMasker's naming scheme. For example:
In the masking results, some of the classifications were not shown as exact. For example, in column 11 of the .out file, some were shown as
DNA/MULE-MuDR
but I only haveDNA/MuDR
in the custom library. Is that a way to keep the classification as provided?Reproduction steps
RepeatMasker -pa 20 -q -div 40 -lib problem.fa -cutoff 225 -gff Col.test.fa
The test files are included in this zipped file: repeatmasker_files.zip
Thank you! Shujun