PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

Error running on Pharmacoscan data #176

Closed anh151 closed 7 months ago

anh151 commented 7 months ago

Hello,

PharmCAT/preprocessor: v2.9.0 Python: 3.9.13 JDK: jdk-17.0.4.1+1 Environment: Linux Bcftools/bgzip/htslib: v1.18

Sorry for always coming with issues. I am working on setting up a pipeline in our lab/dept using PharmCAT to regularly run over Pharmacoscan data. I'm running into an error that I haven't come accross before. I'm curious if this is a PharmCAT error or an issue with our data.

Command used

cd /ihome/pempey/anh151/pharmcat_bin/preprocessor && python3 -m pipenv run python /ihome/pempey/anh151/pharmcat_bin/preprocessor/pharmcat_pipeline /scratch/slurm-2980222/tmps77gvulf/vc_qced.vcf.gz --missing-to-ref -o /ihome/pempey/anh151/results -matcher --research-mode cyp2d6 -cp 63 
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/scratch/slurm-2980222/ipykernel_41182/198163206.py in <module>
----> 1 call_haplotypes(dir_='vcfs', hwe=0.001, variant_call_rate=0.95, sample_call_rate=0.95, debug=True)

/scratch/slurm-2980222/ipykernel_41182/3566189482.py in call_haplotypes(dir_, hwe, variant_call_rate, sample_call_rate, output_dir, debug)
    345                     output_dir = os.path.join(os.getcwd(), output_dir)
    346             maybe_create_dir(output_dir)
--> 347             run_pharmcat(vcf_input=vcf_input, output_dir=output_dir)
    348             move_outputs(vcf_input=vcf_input, tempdir=tempdir, output_dir=output_dir)
    349             write_manifest(

/scratch/slurm-2980222/ipykernel_41182/3566189482.py in run_pharmcat(vcf_input, output_dir)
     78     pharmcat_pipeline_path = os.path.join(preprocessor_path, "pharmcat_pipeline")
     79     num_threads = os.cpu_count() - 1 if os.cpu_count() != 1 else 1
---> 80     run_command(
     81         f"cd {preprocessor_path} && {python_path} -m pipenv run python {pharmcat_pipeline_path} {vcf_input} --missing-to-ref -o {output_dir} -matcher --research-mode cyp2d6 -cp {num_threads}",
     82         shell=True,

/scratch/slurm-2980222/ipykernel_41182/3097037154.py in run_command(args, shell, capture_output)
     33 def run_command(args, shell=False, capture_output=False):
     34     stdout = subprocess.run(args, capture_output=True, text=True, shell=shell)
---> 35     check_stdout(stdout)
     36     if capture_output:
     37         return stdout.stdout

/scratch/slurm-2980222/ipykernel_41182/3097037154.py in check_stdout(stdout)
     40 def check_stdout(stdout):
     41     if stdout.returncode != 0:
---> 42         raise RuntimeError(
     43             f"Error running command {stdout.args} with error: {stdout.stderr}"
     44         )

RuntimeError: Error running command cd /ihome/pempey/anh151/pharmcat_bin/preprocessor && python3 -m pipenv run python /ihome/pempey/anh151/pharmcat_bin/preprocessor/pharmcat_pipeline /scratch/slurm-2980222/tmps77gvulf/vc_qced.vcf.gz --missing-to-ref -o /ihome/pempey/anh151/results -matcher --research-mode cyp2d6 -cp 63 with error: Traceback (most recent call last):
  File "/ihome/pempey/anh151/pharmcat_bin/preprocessor/pharmcat_pipeline", line 294, in <module>
    preprocessed_vcf = preprocessor.preprocess(
  File "/ihome/pempey/anh151/pharmcat_bin/preprocessor/preprocessor/preprocess.py", line 31, in preprocess
    multisample_vcf = _preprocess(pharmcat_positions_vcf, reference_genome,
  File "/ihome/pempey/anh151/pharmcat_bin/preprocessor/preprocessor/preprocess.py", line 105, in _preprocess
    pgx_variants_vcf: Path = util.extract_pgx_variants(pharmcat_positions_vcf, reference_genome, normalized_vcf,
  File "/ihome/pempey/anh151/pharmcat_bin/preprocessor/preprocessor/utilities.py", line 938, in extract_pgx_variants
    ref_pos_dynamic[input_chr_pos].pop(input_ref_alt)
KeyError: ('T', 'C')

Thanks, Andrew

anh151 commented 7 months ago

It was a data issue. I accidentally duplicated the rows in the VCFs. Sorry for the false alarm.