aaranyue / quarTeT

A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification
http://atcgn.com:8080/quarTeT/home.html
81 stars 6 forks source link

Setting of the Telomere parameters #1

Closed Wenwen012345 closed 12 months ago

Wenwen012345 commented 1 year ago

Dear @aaranyue @Echoring

Am I the first? I happen to have just discovered the tool that caters to my needs.

I now have some questions about the species I am studying which is Rhododendron and then the telomere repeat sequence reported in the relevant literature is,repeat motif (TTTAGGG) (Nie et al. 2023, Shi et al. 2023). I am slightly puzzled by this, can you please explain?

The second point is about the selection of the "minimum number of repeats". I noticed that the minimum number of repeats in the online version of the telomere screen is 100 by default, but as I mentioned above in the two papers, they seem to set the minimum number of repeats to 5, so I am a bit confused.

Then in the case of my genome, the final result of the calculation using the online version (with the default parameters) was Telomere repeat monomer: AAACCCT Both telomere found: 2 Only one telomere found: 7 No telomere found: 537. My sequencing type is HIFI & HIC sequencing. I don't know if this is a reasonable result or if there are some operational oversights? RbTelo11111.telo.info.txt

Sorry, I am not familiar with this aspect of telomeres, so I would be greatly appreciated that you can answer some of the possible "basics". Thank you very much for your valuable time and expertise, and I look forward to your reply and guidance. Best regards!

Refrence: Shuai Nie and others, Gapless genome assembly of azalea and multi-omics investigation into divergence between two species with distinct flower color, Horticulture Research, Volume 10, Issue 1, January 2023, uhac241, https://doi.org/10.1093/hr/uhac241

Xiaoya Shi and others, The complete reference genome for grapevine (Vitis vinifera L.) genetics and breeding, Horticulture Research, Volume 10, Issue 5, May 2023, uhad061, https://doi.org/10.1093/hr/uhad061

Echoring commented 1 year ago

Telomere repeat may start at any position of the repeat and from any direction. AAACCCT-(move right 3 bases)->CCCTAAA-(complement)->TTTAGGG, they are equal. Minimum repeat number is set at 100 following the tidk explore's default (https://github.com/tolkit/telomeric-identifier). This works for most species. Telomere with low repeat times may be incorrect or misassembled. If special telomere characteristic is applied at your speices, change the setting as you will. If you are still focusing at telomere, it's recommended to use tidk (https://github.com/tolkit/telomeric-identifier) for detailed survey. Our program only gives a simple summary.

Wenwen012345 commented 1 year ago

@Echoring Thank you very much for your reply. Another minor question is whether the results obtained with your tool, quarTeT, are able to draw such a diagram of telomere/centromere positions? If so, how can this be done? A simple description would be sufficient?

image

Echoring commented 1 year ago

quarTeT draws a similar overview by default, but in a vertical manner, supported by RIdeogram (an R packaged). If you use local version of quarTeT, the necessary data can be accessed in 'tmp' dir.

Wenwen012345 commented 1 year ago

Thank you for your reply. I saw the png results of that run. I'll try running it later with just the genome file containing the presumed chromosome (13) sequences. It would be nice to have the PDF results in a future update.

Anyway thanks for the help!

Wenwen012345 commented 1 year ago

Hi, @Echoring, I'm not quite sure if running TeloExplorer and CentroMiner together is stuck? It has been running for over half an hour. Then I now have a genome size of 814M. TE annotation text size is 171M (from EDTA software, https://github.com/oushujun/EDTA).

If I enter just the genome file alone, it still seems to be stuck.

Echoring commented 1 year ago

I see your job is running. Working on chr4 now. CentroMiner is slower, please be patient.

Wenwen012345 commented 1 year ago

@Echoring Okay. Not quite sure why the figure won't load.

image

atcgn.com_t2t_bioRepository_user_dir_Telo_37973233-996e-4665-b61e-17a50664755d_RbquarTeT.telo.info.txt

Echoring commented 1 year ago

I find that figure drawing is failed. Your genome has too many gaps that makes R exceed the C stack limit. quarTeT is designed for near-T2T genome, such a number of gaps cannot be drawn in the figure of this size.

I have added a warning message when figure drawing is failed.

Wenwen012345 commented 1 year ago

Thanks so much for your help