Dfam-consortium / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
230 stars 50 forks source link

Output .tbl not seems complete #165

Closed marco91sol closed 2 years ago

marco91sol commented 2 years ago

Describe the issue

image

I ran RepeatMasker using a mixture library deriving from denovo and transposable elements known in other species (created with DeepTE).

Attached, you can find the output of the table. Is the problem deriving from header names in the library? The output .tbl seems uncomplete, given that in .out file I find the following TE classes: ClassII_DNA_CACTA_MITE __ClassII_DNA_Harbinger_MITE ClassII_DNA_Harbinger_nMITE ClassII_DNA_Mutator_MITE __ClassII_DNA_Mutator_nMITE ClassII_DNA_PiggyBac_nMITE ClassIII_Helitron ClassI_LTR_BEL ClassI_nLTR ClassI_nLTR_LINE ClassI_nLTR_LINE_I __ClassI_nLTR_LINE_Jockey ClassI_nLTR_LINE_R2 __ClassI_nLTR_PLE ...and so on!

In my customized library, I have two different header type:

Euro_eel_AZBK01S000145.1_16703#__ClassI_nLTR_LINE_L1 #customized library

TE_00016804_INT#__ClassI_nLTR_LINE_Jockey #denovo library

Thanks for the support!

Best regards, Marco

Reproduction steps

  1. Steps to reproduce the behavior, including the command lines given to the program

Log output

Please paste or attach any and all log output, which includes useful information including data file statistics and version numbers. An easy way to capture this is to redirect the log output to a file e.g RepeatMasker myseq.fa >& output.log

Environment (please include as much of the following information as you can find out):

Additional context

rmhubley commented 2 years ago

RepeatMasker has a fairly strict classification nomenclature which makes it difficult to generate complete *.tbl files for custom libraries. For example, if your TEs are not named using the format "id#class/subclass" RepeatMasker will not recognize standard categories of classes. I would suggest you use the util/buildSummary.pl script and ignore everything but the per-family accounting provided. Then you can simply group them however you like to get summary statistics.