Open paulvhi opened 1 year ago
The switch error rate it too high. If the parental data is right but hifiasm did something wrong, the hamming error rate might be higher, but the switch error rate should be lower than 1%. Could you please pick a few long contigs to do double check for the phasing?
I attached a document with the contig length and frequency of switches, etc. I also included trioeval output. This is a highly repetitive and AT rich insect genome. I hope this helps.
Sorry for the late reply. Some contigs have very large number of hamming errors, like h1tg000005l, h1tg000013l and h1tg000031l. I am thinking if you could run yak trioeval
on top of the p_utg.gfa
, and check if the corresponding nodes in the graph of these contigs have large number of errors. Please see FAQ here: https://hifiasm.readthedocs.io/en/latest/faq.html#p-hamming. And there is an example: https://github.com/chhylp123/hifiasm/issues/130#issuecomment-862347943
I wanted to update you on the issues I ran into previously. It seems that combining the two PacBio cells caused some issues. It looks like two different individuals were sequenced. When we assembled reads from each cell separately, the results were much better. In addition, we sequenced the parents much deeper, which also improved the assembly. See attached. I think it looks really good now.
Great! Thanks for letting us know.
Are there some parameters I can adjust to improve my hifi trio binning assemblies? The hap1 assembly is too big (700m vs 530m), while the hap2 assembly is too small (370m vs 530m). Coverage is ~72X. Busco analysis indicates a lot of duplicates (C:98.9% [S:70.9%, D:28.0%], F:0.4%, M:0.7%, n:3285). I tried decreasing -s, but it didn't change much. The hamming rate on the hap1 assembly is below, and I attached the out file. Two factors that you should be aware of:
W 406694 11288140 0.036028 H 670168 11288372 0.059368 N 10281439 1006971 0.089204 hifiasm1.txt