aaranyue / quarTeT

A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification
http://atcgn.com:8080/quarTeT/home.html
101 stars 7 forks source link

usage issues #6

Closed zhangwenda0518 closed 1 year ago

zhangwenda0518 commented 1 year ago

Thanks teacher, for developing the easy to use T2T assembly tool

I ran GapFiller on my assembly result combined with hifi data, maybe I ran TGS-GapCloser in the previous step and wasn't able to continue to make up the gap。 I have a question, the parameter -g GAPCLOSER_CONTIG [GAPCLOSER_CONTIG ...] refers to the result of the assembly using other software or the data used for the assembly?

I also used TeloExplorer and CentroMiner to get the following graphs. I found a problem with the legend-length scale, but I can't find the parameters to modify it.

Also, for centromere positioning, is it possible to show the centromere length range? My species is supposed to be a long centriolar species, and I want to determine its length, can you give me some advice?

Thanks!

image

image

Echoring commented 1 year ago
  1. GAPCLOSER_CONTIG can use either assembled contigs or raw reads (should be FASTA format). However, the quality of gapfilling will be better when well assembled contigs are input. If you use Hi-Fi data to close gaps, you can assemble them using hifiasm software (https://github.com/chhylp123/hifiasm).
  2. the legend-length scale is not by our design. The figure drawn by ourselves have no such scale, so you'd better check your R.
  3. The centromere length is shown in the 4th collum of .candidate. You can collect them like this: `grep ^$'\t' -v .candidate | cut -f 4`
zhangwenda0518 commented 1 year ago

Thanks for your quick reply. As for the first question, I have another one question. I have used quickmerge and RagTag to lift the assembled N50. Is the principle of filling holes in quarTeT using assembled contigs similar to them? What I care about is whether quarTeT is more suitable for application after hic mounting? Anyway, quarTeT is more convenient to use.

The second question, the question of scale。 The dependency program Rideogram that I installed locally before is RIdeogram_0.2.3.tar.gz, https://github.com/TickingClock1992/RIdeogram_test. This version has the function of scale. I modified the file quartet_util.py, and added the adjustment parameter Ruler = 5 according to the size of my genome, and the display is normal. quartet_util.py:251:ideogram(karyotype = chr, label = label, label_type = "marker",Ruler = 5) quartet_util.py:257:ideogram(karyotype = chr,Ruler = 5) image

Echoring commented 1 year ago

The strategy of these program are mainly the same, only minor difference are applied. I haven't compared with quickmerge. As for RagTag, it assumes that the reference genome is more reliable, and the sequences inconsistent with the reference are more likely to be considered errors. Nevertheless, the RagTag “patch” tool applies an aggressive strategy that may discard variation or insert large segments to ultimately close the gaps. It also renames all sequences with an order not coincident with input, which often makes users confused. Conversely, the quarTeT toolkit adopts a conservative strategy and never modifies the raw sequence to avoid variation loss. For short, RagTag is better for re-assemble genome with a high quality reference, and quarTeT is better for de novo assembly with hi-c scafford guide.

The designed workflow of quarTeT is using Hi-C data to scafford pseudo-chromosomes as reference, and use AssemblyMapper to assemble a draft genome, then use GapFiller to close the gaps. However, as the reference genome is highly homologous, it can also be used.