Closed JMUwenjian closed 4 months ago
Hi!
Specifically for LTR insertions, the general way we calculcate age is to extract the LTR sequences from each end of full-length LTRs and compare the divergence between them. To do this, you will need to use specific programs to identify intact LTR elements. You can extract these from the Earl Grey outputs if you go to: /path/to/[species]EarlGrey/[species]_mergedRepeats/looseMerge/[species]LtrFinder
and then look at the GFF3 file. You will see entries like this:
ctg_1 LTR_FINDER_parallel repeat_region 327141 333079 . + . ID=repeat_region1
ctg_1 LTR_FINDER_parallel LTR_retrotransposon 327141 333079 . + . ID=LTR_retrotransposon1;Parent=repeat_region1;tsd=CTAGC;ltr_identity=0.852;seq_number=0
ctg_1 LTR_FINDER_parallel long_terminal_repeat 327141 327343 . + . Parent=LTR_retrotransposon1
ctg_1 LTR_FINDER_parallel long_terminal_repeat 332857 333079 . + . Parent=LTR_retrotransposon1
In this case, the two bottom rows with long_terminal_repeat
labels are the LTRs at either end of the full-length element, so you can extract the sequence of these coordinates. You will also see on the line LTR_retrotransposon
, column 9 has a lot of information, including ltr_identity=0.852
. This is the sequence similarity between the 5' and 3' LTR sequences. If this is 1, then the LTRs are identical and the insertions is extremely recent. If you have a neutral mutation rate for your species, you can then apply this to the divergence to estimate time of insertion (with error, as this assumes TEs are neutral, when they are likely at least under weak purifying selection). Even with just the divergence numbers, you will be able to see which LTRs are more recent and those that are more ancient.
Thank you very much for your reply, which greatly solved my problems and doubts. So can I understand the calculation of LTR insertion time as follows: T= (1-LTR_retrotransposon) / 2 r, where r represents the number of substitutions per synonymous mutation site per year. Looking forward to your reply. Wish you all the best.
Dear author, I am immensely grateful for providing such a convenient and useful tool. I have successfully completed the data testing and achieved satisfactory annotation results. Now, I have a question I would like to consult with you regarding. Specifically, if I intend to conduct an analysis on the insertion timing of LTRs based on all the output generated by EarlGrey, which output file should I utilize, and what software would be most suitable for this purpose?