KolmogorovLab / hapdup

Pipeline to convert a haploid assembly into diploid
Other
85 stars 8 forks source link

Subprocess flye --polish-target fails #19

Closed elcortegano closed 1 year ago

elcortegano commented 1 year ago

Hi, I'm running the version 0.9 of the docker image on a Flye assembly generated from PacBio HiFi reads, and I'm getting the following error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/dist-packages/hapdup/main.py", line 233, in run_flye_hp
    subprocess.check_call(" ".join(flye_cmd), shell=True)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'flye --polish-target assembly.fasta --pacbio-hifi hapdup/margin/MARGIN_PHASED.haplotagged.bam -t 32 -o hapdup/flye_hap_1 --polish-haplotypes 1 2>/dev/null' returned non-zero exit status 1.

This follows with other errors, including:

No such file or directory: 'hapdup/flye_hap_1/polished_1.fasta'

In the end, fasta files are generated:

hapdup/flye_hap_2/polished_1.fasta
hapdup/flye_hap_1/bubbles_1.fasta

But I do not think I'm getting all expected output, nor I'm confident these files are generated without issue. What could be causing the error above?

hapdub was run as follows:

docker run -v $PWD:$PWD -u `id -u`:`id -g` mkolmogo/hapdup:0.9   hapdup --assembly $PWD/assembly.fasta --bam $PWD/reads_assembly.bam --out-dir $PWD/hapdup -t 64 --rtype hifi

The assembly.fasta file is a Flye generated assembly (enabling --pacbio-hifi), and the alignments were generated from the reads with minimap2 -ax map-pb. The assembly is expected to be highly homozygote, since it comes from long established inbred line.

mikolmogorov commented 1 year ago

@elcortegano I suspect phasing will not work very well in homozygous genome. Polishing relies on phased sets of reads, but if only a small fraction of reads was phased, the polisher code will likely fail.

The expectation for hapdup is that the major part of the genome could be phased. If you want to recover alternative alleles in a highly inbred genome, hifiasm (with alternative contig output) might be a better option.

elcortegano commented 1 year ago

Will use hifiasm for this then, thanks for the feedback!