Open Han-Cao opened 1 month ago
I can't say for sure from here, but I would suspect a misjoin in your assembly. We ran into this issue in the original HPRC paper for HG02080#1#JAHEOW010000073.1
and I manually split it as described here.
Whether it's an artifact or somehow a real event, your best bet is to probably split it. The other alternative is to use --noSplit
and just let Cactus align all the different chromosomes together. The issue in doing this is that you're potentially letting data artifacts add a lot of complexity to your graph.
Thank you very much! I will follow the tutorial to manually split them.
Besides, according to the Figure 1B of the HPRC paper, there are many interchromosomal joins. May I know why you decided to only manually split the HG02080#1#JAHEOW010000073.1
contig?
This one was the only one whose alignment we were confident enough in -- the others were all in very repetitive regions like centromeres and rdna arrays etc.
Hi,
After running the
cactus-graphmap-split
, I found a few samples can have very large_AMBIGUOUS_
contig. I manually checked some samples and found it is due to a large contig that mapped to 2 chromosomes.For example, one sample has a 94MB contig mapped to chr9 (56MB) and chr11 (38MB)
After split, this sample lacks a large proportion of chr 9 and chr11 in one assembly.
_AMBIGUOUS_
According to the config of
cactus-graphmap-split
, I can keep this contig by adjusting the threshold ofuf
. However, it will affect more contigs and also cannot keep the sequence for both chromosomes.So, my questions are:
Thank you!