ShunOuchi / GreenHill

De novo chromosome-level scaffolding and phasing tool using Hi-C
GNU General Public License v3.0
25 stars 2 forks source link

GapClose Error with HiC data in Platanus-allele-2.2.2_modified #35

Closed zilov closed 1 month ago

zilov commented 2 months ago

Hello! I'm using GreenHill to obtain a haplotype-resolved assembly for a diploid plant species. I initially tried it directly on my polished NextDenovo assembly but obtained unexpectedly short haplotype contigs. Therefore, I'm now testing it with Platanus allele results.

Data that I have:

I attempted to run platanus-allele-2.2.2_modified from this repository to leverage the HiC data support in the phasing step. However, I encountered an error during the platanus-allele phase GapClose step of the pipeline. The GapCloseLog file shows the following message:

K=32, making hash table...
[PAIR_LIBRARY 1]
mapping reads...
Each dot below indicates 10M reads processed.
...................................................
TOTAL_PAIR = 257947068
MAPPED_IN_SAME_CONTIG = 0 (0)
Error(6): Error, Kmer mapping exception!!
No read mapped in the same contig!!

I will try using the non-modified version of Platanus-allele and provide an update. Could this error be related to the HiC data integration? Has anyone else experienced similar issues?

UPDATE

Got the same error on default Platanus 2.2.2

ShunOuchi commented 2 months ago

Sorry for the late reply. I have never experienced such a issue. It is directly due to the fact that there are no PEs mapped to the same contig, but that is unlikely. Can you show me the all logs and intermediate result's stats of previous round (round?/out_primaryBubble.fa, round?/out_secondaryBubble.fa, round?/out_nonBubbleOther.fa)?

zilov commented 2 months ago

Hello! Here are logs and quast statistics of FASTA files. Files that are not listed in stats are empty or does not have contigs with length > 500bp.

phaase_boecheraa.gapClose.log phaase_boecheraa.solveDBG.log quast_report.txt

ShunOuchi commented 2 months ago

Thank you.

Are the paired-end inputs specified incorrectly? R1 and R2 look the same.

-IP1 /media/eternus1/data/plants/boechera_falcata/raw_reads/illumina/4-2_1.fastq.gz /media/eternus1/data/plants/boechera_falcata/raw_reads/illumina/4-2_1.fastq.gz
zilov commented 2 months ago

Thank you for your assistance with the previous error. I have re-run Platanus-allele-2.2.2_modified with the correct R1 and R2 files, and the first round of the "phase" step completed successfully.

However, I encountered a new error during the second round of the "solveDBG" step. The *.longReadAlignment file appears to be empty in this round. I have attached the logs from the run and the size information for the files generated in the first round for your reference.

Could you advise on how to proceed? Would reducing the number of iterations be a potential solution, or are there other recommendations you might have?

Thank you again for your time and support.

image phaase_boecheraa.solveDBG.log

ShunOuchi commented 2 months ago

The reason for the error is that minimap2 is not installed or the path is not taken.

sh: 1: minimap2: not found

Please install minimap2 and make sure the path goes through.

zilov commented 1 month ago

Thank you for your help! Finally I got the results, closing the issue.