Open sjin09 opened 7 years ago
I have also been able to observe a number of contigs that have significant changes to their sequences. I have uploaded a dotplot illustrating the example. The horizontal sequence is derived from FALCON while the vertical sequence is derived from FALCON_UNZIP.
In such cases, do you have recommendations for diagnosing the changes in the sequence, determining why the sequence has been changed and if the sequence change has been erroneous?
I assume that some of the changes are from haplotype differences, but I also observe a number of haplotigs and its respective pair without any significant matches.
Best, Jin
I have been successfully able to run FALCON (https://github.com/PacificBiosciences/FALCON/issues/514) for the human genome and I am now performing FALCON_UNZIP. FALCON_UNZIP has also been successful, but there were some contigs absent as a result of the graph being circular and returns an empty path #20.
Here, are the assembly statistics for p_ctg.fa.
The assembly statistics for all_p_ctg.fa
I would like to be able to incorporate some of the circular contigs for consensus-calling using arrow. I would love to hear some recommendations for this case.
In addition, I wanted to also inquire about contigs that are completely absent from all_p_ctg.fa but present in the p_ctg.fa. Would it be correct to assume that they have all been incorporated into all_h_ctg.fa? If not, what is the filtering mechanism?
I have also found many of these contigs that were absent or empty contained centromeric sequences. I would probably remove them by matching them against sequences from RepBase, and would want to select out unique sequences for consensus-calling.
Best, Jin