genepi / imputationserver2

MIT License
10 stars 3 forks source link

Imputation Pipeline Execution Fails Shortly After Starting #39

Closed honda-s691470 closed 2 weeks ago

honda-s691470 commented 2 weeks ago

I'm encountering a problem with the Michigan Imputation Server (v2.0.6). Although my VCF files pass all checkVCF.py checks without error, the imputation pipeline fails about a minute and a half into the run with the message: "Pipeline execution failed." Is there any indication from this setup that could explain the imputation failure, or would additional steps (such as VCF format adjustments) be recommended?

OUTPUT_DIR="/XXX" mkdir -p $OUTPUT_DIR

plink --tfile $PLINK_INPUT \ --geno 0.02 --mind 0.02 --maf 0.01 --hwe 0.0001 \ --make-bed --out ${OUTPUT_DIR}/qc_filtered_data

plink --bfile ${OUTPUT_DIR}/qc_filtered_data --freq --out ${OUTPUT_DIR}/qc_filtered_data

wget https://www.chg.ox.ac.uk/~wrayner/tools/HRC-1000G-check-bim-v4.3.0.zip -P $OUTPUT_DIR unzip -o $OUTPUT_DIR/HRC-1000G-check-bim-v4.3.0.zip -d $OUTPUT_DIR

perl ${OUTPUT_DIR}/HRC-1000G-check-bim.pl -b ${OUTPUT_DIR}/qc_filtered_data.bim \ -f ${OUTPUT_DIR}/qc_filtered_data.frq \ -r ${OUTPUT_DIR}/HRC.r1-1.GRCh37.wgs.mac5.sites.tab -h cd $OUTPUT_DIR

bash Run-plink.sh

OUTPUT_DIR2="/YYY" for chr in {1..22}; do plink --bfile ${OUTPUT_DIR}/qc_filtered_data-updated-chr${chr} --real-ref-alleles --recode vcf --out ${OUTPUT_DIR2}/qc_filtered_data-updated-chr${chr} bcftools sort ${OUTPUT_DIR2}/qc_filtered_data-updated-chr${chr}.vcf -Oz -o ${OUTPUT_DIR2}/qc_filtered_data-updated-chr${chr}.vcf.gz bcftools index ${OUTPUT_DIR2}/qc_filtered_data-updated-chr${chr}.vcf.gz done

honda-s691470 commented 2 weeks ago

I was able to successfully run the imputation job on the Michigan Imputation Server after removing the line:

plink --bfile ${OUTPUT_DIR}/qc_filtered_data-updated-chr${chr} --real-ref-alleles --recode vcf --out ${OUTPUT_DIR2}/qc_filtered_data-updated-chr${chr}

It appears that the duplicate use of --real-ref-alleles caused an issue in the VCF preparation process, possibly impacting the integrity of the output files. After removing this line, the server processed the job without any issues, and the imputation pipeline proceeded smoothly.

Thank you for the assistance. I am closing this issue as it is now resolved.