Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
230 stars 50 forks source link

Penelope annotation #258

Closed anna-elisabet closed 1 month ago

anna-elisabet commented 5 months ago

Hello,

With multiple runs of RepeatMasker 4.1.5 on my genome, I noticed that when masking with RepBase libraries, Penelopes are reported at 0.00%, and with my own, classified de novo library, Penelopes are at 1%. This also depends on whether I label them as "#PLE" or as "#LINE/Penelope" (as it is labeled in the RepBase database). With the "#LINE/Penelope" classification, 0% Penelopes are reported and LINEs are increased. I inferred that this classification leads to Penelopes being counted under "LINEs" and not "Penelopes".

My question is whether my conclusion is right, and whether I need to reclassify the RepBase library (installed for RepeatMasker) so that all the elements classified "#LINE/Penelope" are changed to "PLE", to ensure correct annotation in the tbl file.

Thank you

rmhubley commented 2 months ago

The .tbl file is just one way to tabulate the results and as you found is quite opinionated about it. I would suggest using the utility "RepeatMasker/util/buildSummary.pl " to process your .out file. This will give per-family stats so that you can tabulate them as you like.