Questions about the identification of Centromeres

MuyuenHoshino commented 10 months ago

Dear Immortal2333, I am using your method to identify centromeres.I find that there is a aligned area where Copia ,Gypsy ,Helitron are blank in each chromosome in your plot,which is the centromere area you idntify.But in my data, the area is not a total gap, there are some scattered part in it.Is it a normal phenomenon ? Can I still use this method to identify centromere ? Thank you

MuyuenHoshino commented 10 months ago

I find the similar phenomenon in your paper's plot in other chromosome,like chr17.

Immortal2333 commented 10 months ago

Hi Muyuen,

Yes, it is. I'd be happy to share you some relevant papers to address your concerns.

https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005997:

"The main DNA components of the centromere are highly repetitive, such as the 171-bp α-satellite repeat in humans and 150- to 180-bp simple tandem repeats in some flowering plants [1–5]. Long-terminal repeat (LTR) retrotransposons, also known as centromeric retrotransposons (CRs), are often intermingled with tandem repeats and are enriched in plant centromeric regions [6–11]. "

https://www.sciencedirect.com/science/article/pii/S1369526618301286:

"In most plants, the DNA sequences underlying the centromeres consist of megabase arrays of tandemly repeated satellite sequences, interspersed with Gypsy LTR retrotransposons [36,37,38,39]."

Hofstatter, Paulo G., et al. "Repeat-based holocentromeres influence genome architecture and karyotype evolution." Cell 185.17 (2022): 3153-3168.:

However, if you're struggling to locate centromeres as followed, you might find the answers in the previously mentioned paper above.

ChIP-seq is often considered the gold standard method for accurately locating centromeres.

Song, Jia-Ming, et al. "Two gap-free reference genomes and a global view of the centromere architecture in rice." Molecular plant 14.10 (2021): 1757-1767.:

I hope these references could help you. Thank you for your questions!

Best wishes,

Xu

MuyuenHoshino commented 10 months ago

Dear Xu, Thank you for your timely and helpful reply.My data seems like monocentric J. effusus.And we didn't do CHIP-seq, so I will use your method. But I still have a confusion.

"The main DNA components of the centromere are highly repetitive, such as the 171-bp α-satellite repeat in humans and 150- to 180-bp simple tandem repeats in some flowering plants [1–5]. Long-terminal repeat (LTR) retrotransposons, also known as centromeric retrotransposons (CRs), are often intermingled with tandem repeats and are enriched in plant centromeric regions [6–11]. "

From this, I know that LTR should be enriched in the centromeric regions.The C plot in the followed picture also show high TE density in the centromeric regions.

But in this picture, LTR/Gypsy and LTR/Copia are low in the circled region.

"In most plants, the DNA sequences underlying the centromeres consist of megabase arrays of tandemly repeated satellite sequences, interspersed with Gypsy LTR retrotransposons [36,37,38,39]."

This text also says that Gypsy LTR retrotransposons are interspersed in the centromeric regions.

Maybe I misunderstood because my English is poor. I feel like there's a contradiction here.

Best wishes

Immortal2333 commented 10 months ago

First of all, transposable elements (TEs) encompass various classes, including LTR (Copia, Gypsy, etc.), TIR (CACTA, hAT, Mutator, etc.), SINE elements, LINE elements, and more. The diverse range of types might contribute to the observed 'TE density' in Fig. C. The approach described in this paper involves:

"For overlapping repeats in different classes, LTR retrotransposons were kept first, next terminal inverted repeats (TIRs), and then short and long interspersed nuclear elements, and finally helitrons. This priority order was based on stronger structural signatures. In addition, the known nested insertion models (LTR into helitron, helitron into LTR, TIR into LTR, LTR into TIR) were retained."

Source: Song, Jia-Ming, et al. "Two gap-free reference genomes and a global view of the centromere architecture in rice." Molecular plant 14.10 (2021): 1757-1767.

Significantly, LTR stands for Long Terminal Repeats, with sequence lengths typically ranging from hundreds of Kb to a few Mb. Consequently, the coverage of window in this area is usually around one or two for individual LTR/Copia or LTR/Gypsy elements, as depicted in Fig. E. Although their peaks might not be readily apparent, they undoubtedly exist within this region.

This is a statistic from EDTA:

MuyuenHoshino commented 10 months ago

Dear Xu, Thank you for your explanation. I understand what you mean. Thanks again for your thoughtful reply, which really helped me a lot. Best wishes

MuyuenHoshino commented 9 months ago

Hi Xu, Long time no see.Thank you for all your help ,our work is almost finished. My colleagues want me to confirm that if there are quantitative criteria for the identification of centromere in your method.Or do you just use certain period repeats in the IGV (such as 107) to determine a general range.

“We then inferred the region with centromeric repeats and low TE density as the centromeres after zooming one by one .”

Best wishes, Guo

Immortal2333 / Telomeres_and_Centromeres

Questions about the identification of Centromeres #6