Closed RY-Zheng closed 2 weeks ago
It might be due to the merging of overlap records in HiTE.detail.tbl
from HiTE.out
. You can refer to the details implemented in RepeatMasker’s buildSummary.pl
.
If you specify --annotate 1
, it should also generate theHiTE.gff
file.
It might be due to the merging of overlap records in
HiTE.detail.tbl
fromHiTE.out
. You can refer to the details implemented in RepeatMasker’sbuildSummary.pl
.If you specify
--annotate 1
, it should also generate theHiTE.gff
file.
I specified '--annotate 1' and also generated 'HiTE.gff' file. But the HiTE.gff file doesn't seem to have a specific classification, such as LTR/Gypsy. At the same time, the number of TE in theHiTE.gff file and HiTE.out file is consistent, but it is still inconsistent with the number of HiTE.tbl and HiTE.detail.tbl. Is there a filtering mechanism for HiTE.tbl and HiTE.detail.tbl? I want to extract all the sequences of HiTE.tbl and HiTE.detail.tbl records. Thank you very much
HiTE.gff :
Chr01.1 RepeatMasker similarity 26 10844 0.4 + . Target "Motif:(CTAAACC)n" 1 10799 Chr01.1 RepeatMasker similarity 10845 10998 7.2 - . Target "Motif:TIR_617" 366 526 Chr01.1 RepeatMasker similarity 10859 11011 6.9 + . Target "Motif:TIR_1" 40 203 Chr01.1 RepeatMasker similarity 11013 11287 9.8 - . Target "Motif:LTR_1087_LTR" 1419 1577 Chr01.1 RepeatMasker similarity 11270 11378 11.8 + . Target "Motif:TIR_699" 257 350 Chr01.1 RepeatMasker similarity 11380 12150 9.2 + . Target "Motif:(ATTATGAC)n" 1 795 Chr01.1 RepeatMasker similarity 12151 12450 11.8 + . Target "Motif:TIR_699" 351 672 Chr01.1 RepeatMasker similarity 12451 12503 11.4 - . Target "Motif:TIR_229" 865 1229 Chr01.1 RepeatMasker similarity 12504 12609 3.5 + . Target "Motif:TIR_435" 804 1280 Chr01.1 RepeatMasker similarity 12610 12684 8.5 + . Target "Motif:(ATTATGAC)n" 1 75 Chr01.1 RepeatMasker similarity 12686 12909 10.3 - . Target "Motif:TIR_1" 418 614 Chr01.1 RepeatMasker similarity 12910 13264 12.6 + . Target "Motif:LTR_1087_LTR" 1043 1424 Chr01.1 RepeatMasker similarity 12988 13062 9.2 - . Target "Motif:TIR_435" 781 1306 Chr01.1 RepeatMasker similarity 13063 13482 16.4 + . Target "Motif:TIR_229" 354 777 Chr01.1 RepeatMasker similarity 13483 13631 10.3 + . Target "Motif:(TCATAATG)n" 1 151
You can follow this guide (https://www.biostars.org/p/382150/) to convert the .out
file to a .gff
file. In fact, both the .out
and .tbl
files are generated by RepeatMasker
, as invoked by HiTE. Therefore, to understand the specific correspondence between the .out
and .tbl
files, you may need to refer to the RepeatMasker
code.
Ok, thank you. What I want to ask is how to extract the same number of sequences as recorded in HiTE.detail.tbl. Just like HiTE.detail.tbl: Class Count bpMasked %masked ===== ===== ======== ======= DNA -- -- -- CMC-EnSpm 2396 692283 0.03% Crypton 822 246074 0.01% MULE 93620 19726548 0.74% Merlin 2169 816875 0.03% PIF-Harbinger 95019 20725099 0.77% TcMar 222014 84862829 3.17% hAT 35133 7498474 0.28% LINE -- -- -- L1 70974 84627363 3.16% LTR -- -- -- Copia 194336 84748164 3.16% ERV 83462 56049195 2.09% Gypsy 1123194 1771812252 66.17% Here is a record of the number of LTR/Gypsy as 1123194. How can I extract the complete sequence of these numbers?
Thank you
I'm not sure why the number of records in the .detail.tbl
file is inconsistent with that in the .out
file, as this is generated using RepeatMasker's buildSummary.pl
, as I mentioned.
If you want to extract LTR/Gypsy sequences, you can easily extract the corresponding sequences based on the .gff
file and calculate metrics like Count
and bpMasked
yourself.
Dear developers, I have run HiTE and output multiple files. These include HiTE.out and HiTE.detail.tbl, but I manually checked the number of sequences in HiTE.out and found it was inconsistent with the number recorded in HiTE.detail.tbl, and the number of HiTE.out records was higher. Could you please answer this question? Also, is there a way to convert HiTE.out to annotation files so that I can extract specific sequences, such as all LTR/Gypsy sequences? Or is there a way to extract sequences based on the statistic results of ITE.detail.tbl or ITE.tbl?
$grep -c "LTR/Gypsy" HiTE.out 1340604
HiTE.detail.tbl: Class Count bpMasked %masked ===== ===== ======== ======= DNA -- -- --
CMC-EnSpm 2396 692283 0.03% Crypton 822 246074 0.01% MULE 93620 19726548 0.74% Merlin 2169 816875 0.03% PIF-Harbinger 95019 20725099 0.77% TcMar 222014 84862829 3.17% hAT 35133 7498474 0.28% LINE -- -- --
L1 70974 84627363 3.16% LTR -- -- --
Copia 194336 84748164 3.16% ERV 83462 56049195 2.09% Gypsy 1123194 1771812252 66.17% RC -- -- --
Helitron 172698 49453545 1.85% SINE -- -- --
tRNA 16209 2247155 0.08% Unknown 121537 31955024 1.19%