fanagislab / EndHiC

EndHic is a fast and easy-to-use Hi-C scaffolding tool, using the Hi-C links from contig end regions instead of whole contig regions to assemble large contigs into chromosomal-level scaffolds.
19 stars 3 forks source link

how to use asm_error_check.pl results to continue the pipeline? #9

Open xiekunwhy opened 1 year ago

xiekunwhy commented 1 year ago

Hi,

How to use asm_error_check.pl to continue the pipeline? We need to break fasta sequences and re-start from HiC-Pro read mapping?

Best, Kun

fanagislab commented 1 year ago

Wangsen, Could you answer this question?

@.***

From: xiekunwhy Date: 2023-11-05 23:33 To: fanagislab/EndHiC CC: Subscribed Subject: [fanagislab/EndHiC] how to use asm_error_check.pl results to continue the pipeline? (Issue #9) Hi, How to use asm_error_check.pl to continue the pipeline? We need to break fasta sequences and re-start from HiC-Pro read mapping? Best, Kun — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

kingforest93 commented 1 year ago

Hi Kun, It should be noted that the contig break function by Hi-C data is not very convincible, and we suggest you get the accurate positions of contig assembly errors based on multiple information, such as the graph structure of Hifiasm, Hi-C heatmap within contig, etc. If you have already known the positions to break contigs, there is no need to rerun the whole HiC-Pro mapping pipeline. For example, if you think the prediction results of asm_error_check.pl is right, then extract the contigs and their break positions and use the following scripts split_len.pl, split_bed.pl, and split_fasta.pl to break the corresponding contigs.len, hic.bed, and contigs.fa file. And the generated new files (contigs.splited.len and hic.splited.bed) and original hic.matrix can be used as input of endhic.pl. perl split_len.pl ctg_split.pos contigs_all.len > contigs_all.ctg_splited.len perl split_bed.pl ctg_split.pos hic_10000_abs.bed > hic_10000_abs_splited.bed perl split_fasta.pl ctg_split.pos contigs.fasta > contigs_splited.fasta The three scripts are attached below as a ZIP file. splt_ctg.zip

xiekunwhy commented 1 year ago

Thank for your reply and suggestions, I will try it.

Best, Kun