aaranyue / quarTeT

A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification
http://atcgn.com:8080/quarTeT/home.html
101 stars 7 forks source link

Subject: Using QuarTeT GapFiller with Unplaced Scaffolds and Flanks from a related species. #42

Open Isoris opened 1 month ago

Isoris commented 1 month ago

Hello,

I would like to use QuarTeT GapFiller to fill gaps in my Clarias macrocephalus assembly chromosomes using unplaced scaffolds. I have mapped these unplaced scaffolds to a closely related species, Clarias fuscus, and obtained 50 Mb of sequence with high accuracy. However, QuarTeT AssemblyMapper outputs the results with 100 bp gaps.

My aim is to use the contigs along with the flanking regions from Clarias fuscus, making them chimeric. This way, I can run QuarTeT GapFiller with the initial chromosomes from C. macrocephalus and these modified scaffolds that include flanks from C. fuscus. The idea is to provide the necessary homology for effective gap filling, as the original assembly lacks sufficient flank homology for direct gap filling.

Would it be possible to get the output with the reference flanks from AssemblyMapper to achieve this?

Thank you!

Echoring commented 1 month ago

I'm not sure I got your point. What you wish is when AssemblyMapper place 2 adjacent contigs, it not only use 100N to connect them but also merge some flanking sequence from the reference? This looks somehow hard to declare the border. If you want a longer flanking sequence to reach the homologous region with reference, can it be achieved by use a large -f option?

Isoris commented 1 month ago

More like I want to also have another output with the contig that was mapped to the reference + the two flanking sequences of the length -f on both sides?

On Sun, Oct 20, 2024, 9:38 AM Echoring @.***> wrote:

I'm not sure I got your point. What you wish is when AssemblyMapper place 2 adjacent contigs, it not only use 100N to connect them but also merge some flanking sequence from the reference? This looks somehow hard to declare the border. If you want a longer flanking sequence to reach the homologous region with reference, can it be achieved by use a large -f option?

— Reply to this email directly, view it on GitHub https://github.com/aaranyue/quarTeT/issues/42#issuecomment-2424453132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASYS5TFOZ6PV3QDHRYV7ZMTZ4MJRHAVCNFSM6AAAAABQHBISNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUGQ2TGMJTGI . You are receiving this because you authored the thread.Message ID: @.***>

Echoring commented 1 month ago

QQ20241023-124238 I'm still not sure, in this case (alignment breaks a little far from gap), you want the red circle or the pink circle?

Isoris commented 1 month ago

Hello,

It may seem strange but I would like that we could specify a flanking length in the reference but outward. For instance if a contig A aligns to the ref at the ref position 6000 to 10,000 and we specify the option -fo or flank outward for instance = 3000 it would make a chimeric sequence with 5' 3000 bases from the ref + the aligned contig + 3000 bases from the ref in 3'

image

This way we can use this chimera to fill gaps in genome assemblies because sometimes the contigs that we get are cut when there is repeats but maybe in a close species this repeat was already solved if the assembly was made with ultra long reads and was lucky to have a read which spanned the full region.

Echoring commented 1 month ago

I've made a quick try at branch extract-ref-flanks. I'm not sure it works fine. Would you like to have a try?