GFF attributes and wrong TE classification??

There are many factors that can affect the classification of TEs using RepeatClassifier (which provides classification for consensus sequences, and is part of RepeatModeler2). RepeatClassifier will use the database of repeats that RepeatMasker has been configured with to help determine the potential class of each consensus. In some cases, there is a little homology in a consensus sequence that score high enough to result in the consensus sequence gaining a classification.

As always, automated tools are meant to provide a starting point to aid with genomics studies. Repeat annotation is inherently challenging and no one method will give "perfect" results. Due to this, some level of manual curation is always needed for robust TE classification and annotation. In the case of the AGO2 genes, there is likely some small track of homology to something in your TE database that results in RepeatClassifier giving a consensus a classification. Also, if these are multicopy, they will be picked up by RepeatModeler, but manual curation will help in removing these.

There are also pre-existing libraries of good quality depending on which species of Drosophila you are working on, so I would recommend looking into some of these as well.

Regarding the attributes, these are used in RepeatCraft to aid in resolving and defragmenting annotations. It is a way of storing the information for parsing. The boundaries of the TE are as normal found in the chr, start, and end columns of the GFF and BED file.

TobyBaril / EarlGrey

GFF attributes and wrong TE classification?? #99