Illumina / hap.py

Haplotype VCF comparison tools
Other
406 stars 125 forks source link

Docker Implementation: Several Error Messages Related to "preprocess" #178

Closed cwarden45 closed 1 year ago

cwarden45 commented 1 year ago

Hi,

I am currently using the Docker image described on the main page (pkrusche/hap.py) to run hap.py.

I am running the following command (for some public HG002 data):

TEST=../../copied_files/analysis/small_variants_happy/hg002_hac_happy_out/hac_happy_out.vcf.gz
REF=../../../../PrecisionFDA/GIAB-hg38_latest-230726/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
OUT=hg002_hac_happy-vs-GAIB_v4.2.1
FA=../GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
BED=../../../../PrecisionFDA/GIAB-hg38_latest-230726/SupplementaryFiles/HG002_GRCh38_1_22_v4.2.1_callablemultinter_gt0.bed.gz

/opt/hap.py/bin/hap.py $REF $TEST -f $BED -o GIAB_CALLABLE_$OUT -r $FA

I have a text message with the full output (captured with &> out.log), along with the associated .json file (as a .zip file):

GIAB_CALLABLE_hg002_hac_happy-vs-GAIB_v4.2.1.runinfo.zip

out.log

As far as I understand, I think the issues relates to the following error messages:

One example of error message that I believe appears many times (with variation in exact parallel command):

2023-08-02 17:14:29,232 ERROR    Preprocess command preprocess /tmp/tmpjpVvV1.vcf.gz:* -l chr13:25603406-33390761 -o /tmp/input.chr13:25603406-33390761YRx3Gx.prep.vcf.gz -V 1 -L 1 -r ../GCA_000001405.15_GRCh38_no_alt_analysis_set.fa failed. Outputs are here /tmp/stdoutyoQZzk.log / /tmp/stderrqjwmOg.log
2023-08-02 17:14:29,234 ERROR    preprocess: vcf.c:3472: bcf_update_format: Assertion `!fmt->p_free' failed.
2023-08-02 17:14:29,234 ERROR    Aborted (core dumped)
2023-08-02 17:14:29,236 ERROR    Exception when running <function preprocessWrapper at 0x7f7c04faa140>:
2023-08-02 17:14:29,237 ERROR    ------------------------------------------------------------
2023-08-02 17:14:29,238 ERROR    Traceback (most recent call last):
2023-08-02 17:14:29,238 ERROR      File "/opt/hap.py/lib/python27/Tools/parallel.py", line 72, in parMapper
2023-08-02 17:14:29,240 ERROR        return arg[1]['fun'](arg[0], *arg[1]['args'], **arg[1]['kwargs'])
2023-08-02 17:14:29,241 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 67, in preprocessWrapper
2023-08-02 17:14:29,242 ERROR        subprocess.check_call(to_run, shell=True, stdout=tfo, stderr=tfe)
2023-08-02 17:14:29,243 ERROR      File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
2023-08-02 17:14:29,246 ERROR        raise CalledProcessError(retcode, cmd)
2023-08-02 17:14:29,248 ERROR    CalledProcessError: Command 'preprocess /tmp/tmpjpVvV1.vcf.gz:* -l chr13:25603406-33390761 -o /tmp/input.chr13:25603406-33390761YRx3Gx.prep.vcf.gz -V 1 -L 1 -r ../GCA_000001405.15_GRCh38_no_alt_analysis_set.fa' returned non-zero exit status 134

Final Error Message:

2023-08-02 20:03:27,093 ERROR    One of the preprocess jobs failed
2023-08-02 20:03:27,094 ERROR    Traceback (most recent call last):
2023-08-02 20:03:27,094 ERROR      File "/opt/hap.py/bin/hap.py", line 508, in <module>
2023-08-02 20:03:27,095 ERROR        main()
2023-08-02 20:03:27,095 ERROR      File "/opt/hap.py/bin/hap.py", line 363, in main
2023-08-02 20:03:27,095 ERROR        "QUERY")
2023-08-02 20:03:27,096 ERROR      File "/opt/hap.py/bin/pre.py", line 203, in preprocess
2023-08-02 20:03:27,096 ERROR        haploid_x=gender == "male")
2023-08-02 20:03:27,096 ERROR      File "/opt/hap.py/lib/python27/Haplo/partialcredit.py", line 214, in partialCredit
2023-08-02 20:03:27,097 ERROR        raise Exception("One of the preprocess jobs failed")
2023-08-02 20:03:27,097 ERROR    Exception: One of the preprocess jobs failed

Can you please help me troubleshoot this issue?

Thank you very much!

Sincerely, Charles

cwarden45 commented 1 year ago

I am closing this ticket because I believe this is due to the formatting of one of the VCF files.

To help avoid the issue for others, the VCF was downloaded from the link below (and is labeled as the output of hap.py):

https://labs.epi2me.io/askenazi-kit14-2022-12/

However, I am closing the ticket because I could successfully run the following command:

TEST=../../GIAB-hg38_latest-230726/SupplementaryFiles/inputvcfsandbeds/HG002_GRCh38_1_22_PacBio_HiFi_GATK4.vcf.gz
REF=../../GIAB-hg38_latest-230726/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
OUT=GIAB_PacBio_GATK-vs-GAIB_v4.2.1
FA=../../../EPI2ME/giab_lsk114_2022.12/additional_analysis/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
BED=../../GIAB-hg38_latest-230726/SupplementaryFiles/HG002_GRCh38_1_22_v4.2.1_callablemultinter_gt0.bed.gz

In other words, I believe the issue was caused when comparing a VCF with 2 columns for "TRUTH" and "QUERY" (versus the successful VCFs with 1 column for genotypes for "HG002").

If it is helpful for others for me to share any additional information, then I am happy to do so.