Open xiekunwhy opened 1 year ago
Hi, Kun Sorry, my question has nothing to do with the topic of this issue. Can I know what tool you use to visualize.HIC files? It looks better than the images exported from juiceBox.
Hi @yangqimeng99 ,
I was modified a script from endhic (https://github.com/fanagislab/EndHiC), matrix2heatmap.py, it accept HiC-Pro bed and matrix files, not HIC files. If you still have the corresponding bam file, you can create bed and matrix file using tools in HiC-Pro pipeline, and then plot the results. This page may help you to convert bam to HiC-Pro files(start from 3.6) https://blog.sciencenet.cn/home.php?mod=space&uid=2970729&do=blog&id=1185463
Best, Kun
Thank you very much for your sharing and suggestions! @xiekunwhy
Hello Kun,
In 1.2, we are trying to fix the telo-to-telo misjoin problem. We saw this problem for some plant genomes. This is, however, still under development, so there is no release yet. In your case, the fix seems not very successful...
There are also some extra changes in 1.2, such as better AGP format compatibility and pair format input.
Best, Chenxi
Hi Chenxi,
Thank you for your reply, and I know the differences now. I need to tell you an other problem.
Yahs tend to misjoin and create more butterfly connections than endhic when anchoring high quality contigs (same contigs, same bam file used). contig Nx: Total: 724212520 Count: 43 Average: 16842151.63 Median: 1939243 N00: 79080472 N10: 79080472 N20: 58646112 N30: 52944404 N40: 51019557 N50: 40339182 N60: 35403800 N70: 34284038 N80: 28762492 N90: 17838659 N100: 124441
The yahs results (yahs1.2a.1 --no-mem-check -o sbi.nd.yahs -q 0 sbi.polish.fa sbi.dedup.bam), scaffold 1 is a mis-join scaffold, most of other scaffold are butterfly connected scaffold.
The endhic results, things seem all good,
Best, Kun
Hi Kun,
Thanks for showing the example. You are right, we indeed saw this for scaffolding near-complete genome assemblies and is exactly the problem we want to solve in version 1.2.
By the way, the -q 0
is quite aggressive, meaning to use all multi-mapping reads, which tends to introduce more assemblies errors, especially in the repetitive regions. I am not sure though if dropping it will solve the problem in the first scaffold. We did see misjoins with the default settings, which is -q 10
.
Best, Chenxi
Hi Chenxi,
I use HiC-Pro pipeline to mapping the reads, low quality and multi-mapping has been removed when combining read1 and read2 results. And I got exactly the same results using -q 10.
Best, Kun
Hi Chenxi,
I had the same problem with two misjoins between the 1st and 2nd scaffold, and the 3rd and 4th scaffold (expected chromosome number is 9) using v1.2 for a plant genome. Is there a fix for this yet?
yahs assembly.fasta trimmed_PAL_046_3_NGS23-B040_BHHC3MDSX7_S441_L002_combined_dedup_HiC.bam -o yahs_rerun
Thanks, Surabhi
@surabhiranavat ,if contig n90 is large enough, try other softwares, like endhic or haphic.
@xiekunwhy Thank you for the suggestion!
Hi,
What are the differece between v1.2 (git clone from github source and compiled) and v1.2a.1( install using conda)?
I got differece results using these two version, it seem that v1.2 (git clone from github source and compiled) worse than v1.2a.1(install using conda), because v1.2 tend to generate wrong connection(scaffold_1).
Here is v1.2 (git clone from github source and compiled) result,
Here is v1.2a.1 (install using conda) result, this result is more close to our expected (I can not tell you what species is).
All parameters are the same when running these two versions.
Best, Kun