c-zhou / yahs

Yet another Hi-C scaffolding tool
MIT License
130 stars 19 forks source link

Difference between v1.2 and v1.2a.1? #71

Open xiekunwhy opened 1 year ago

xiekunwhy commented 1 year ago

Hi,

What are the differece between v1.2 (git clone from github source and compiled) and v1.2a.1( install using conda)?

I got differece results using these two version, it seem that v1.2 (git clone from github source and compiled) worse than v1.2a.1(install using conda), because v1.2 tend to generate wrong connection(scaffold_1).

Here is v1.2 (git clone from github source and compiled) result, image

Here is v1.2a.1 (install using conda) result, this result is more close to our expected (I can not tell you what species is). image

All parameters are the same when running these two versions.

Best, Kun

yangqimeng99 commented 1 year ago

Hi, Kun Sorry, my question has nothing to do with the topic of this issue. Can I know what tool you use to visualize.HIC files? It looks better than the images exported from juiceBox.

xiekunwhy commented 1 year ago

Hi @yangqimeng99 ,

I was modified a script from endhic (https://github.com/fanagislab/EndHiC), matrix2heatmap.py, it accept HiC-Pro bed and matrix files, not HIC files. If you still have the corresponding bam file, you can create bed and matrix file using tools in HiC-Pro pipeline, and then plot the results. This page may help you to convert bam to HiC-Pro files(start from 3.6) https://blog.sciencenet.cn/home.php?mod=space&uid=2970729&do=blog&id=1185463

Best, Kun

yangqimeng99 commented 1 year ago

Thank you very much for your sharing and suggestions! @xiekunwhy

c-zhou commented 1 year ago

Hello Kun,

In 1.2, we are trying to fix the telo-to-telo misjoin problem. We saw this problem for some plant genomes. This is, however, still under development, so there is no release yet. In your case, the fix seems not very successful...

There are also some extra changes in 1.2, such as better AGP format compatibility and pair format input.

Best, Chenxi

xiekunwhy commented 1 year ago

Hi Chenxi,

Thank you for your reply, and I know the differences now. I need to tell you an other problem.

Yahs tend to misjoin and create more butterfly connections than endhic when anchoring high quality contigs (same contigs, same bam file used). contig Nx: Total: 724212520 Count: 43 Average: 16842151.63 Median: 1939243 N00: 79080472 N10: 79080472 N20: 58646112 N30: 52944404 N40: 51019557 N50: 40339182 N60: 35403800 N70: 34284038 N80: 28762492 N90: 17838659 N100: 124441

The yahs results (yahs1.2a.1 --no-mem-check -o sbi.nd.yahs -q 0 sbi.polish.fa sbi.dedup.bam), scaffold 1 is a mis-join scaffold, most of other scaffold are butterfly connected scaffold. image

The endhic results, things seem all good, image

Best, Kun

c-zhou commented 1 year ago

Hi Kun,

Thanks for showing the example. You are right, we indeed saw this for scaffolding near-complete genome assemblies and is exactly the problem we want to solve in version 1.2.

By the way, the -q 0 is quite aggressive, meaning to use all multi-mapping reads, which tends to introduce more assemblies errors, especially in the repetitive regions. I am not sure though if dropping it will solve the problem in the first scaffold. We did see misjoins with the default settings, which is -q 10.

Best, Chenxi

xiekunwhy commented 1 year ago

Hi Chenxi,

I use HiC-Pro pipeline to mapping the reads, low quality and multi-mapping has been removed when combining read1 and read2 results. And I got exactly the same results using -q 10.

Best, Kun

surabhiranavat commented 8 months ago

Hi Chenxi,

I had the same problem with two misjoins between the 1st and 2nd scaffold, and the 3rd and 4th scaffold (expected chromosome number is 9) using v1.2 for a plant genome. Is there a fix for this yet?

yahs assembly.fasta trimmed_PAL_046_3_NGS23-B040_BHHC3MDSX7_S441_L002_combined_dedup_HiC.bam -o yahs_rerun

p_elata_out_JBAT_rerun_hic

Thanks, Surabhi

xiekunwhy commented 8 months ago

@surabhiranavat ,if contig n90 is large enough, try other softwares, like endhic or haphic.

surabhiranavat commented 8 months ago

@xiekunwhy Thank you for the suggestion!