EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

TypeError when Genotyping #33

Closed RSherman15 closed 5 years ago

RSherman15 commented 5 years ago

I am getting the following error during genotyping. It appears to perhaps be an issue with pysam reading the generated alignments.cram file, but it is a standard file, as far as I can tell, looking at it via samtools view. The same underlying pysam TypeError also occurs for the gt_call_sample_insert_delta and gt_call_sample_breakpoint_depth steps.

Building DAG of jobs...
Using shell: /software/centos7/bin/bash
Provided cores: 24
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       gt_call_sample_read_depth
        1

[Thu Apr 25 07:31:47 2019]
rule gt_call_sample_read_depth:
    input: sv_calls/sv_calls.bed, samples/hg002/alignments.cram, altref/alt_info.bed
    output: samples/hg002/temp/depth_delta.tab, samples/hg002/depth_delta_stats.tab
    jobid: 0
    wildcards: sample=hg002

Traceback (most recent call last):
  File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/scripts/genotype/GetReadDepthDiff.py", line 187, in <module>
    get_read_depth(df_bed.loc[:, ['VAR_CONTIG', 'VAR_MIDPOINT']], args.bam, args.mapq)
  File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/scripts/genotype/GetReadDepthDiff.py", line 71, in get_read_depth
    for segment in bam_file.fetch(contig, pos, pos + 1):
  File "pysam/libcalignmentfile.pyx", line 1074, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region
  File "pysam/libcalignmentfile.pyx", line 1879, in pysam.libcalignmentfile.AlignmentFile.get_tid
  File "pysam/libcalignmentfile.pyx", line 516, in pysam.libcalignmentfile.AlignmentHeader.get_tid
  File "pysam/libcutils.pyx", line 125, in pysam.libcutils.force_bytes
TypeError: Argument must be string, bytes or unicode.
[Thu Apr 25 07:31:52 2019]
Error in rule gt_call_sample_read_depth:
    jobid: 0
    output: samples/hg002/temp/depth_delta.tab, samples/hg002/depth_delta_stats.tab

RuleException:
CalledProcessError in line 345 of /home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/genotype.snakefile:
Command ' set -euo pipefail;  python3 -s /home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/scripts/genotype/GetReadDepthDiff.py samples/hg002/alignments.cram sv_calls/sv_calls.bed altref/alt_info.bed samples/hg002/temp/depth_delta.tab --out_stats samples/hg002/depth_delta_stats.tab --mapq 20 --flank 100 ' returned non-zero exit status 1.
  File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/rules/genotype.snakefile", line 345, in __rule_gt_call_sample_read_depth
  File "/home-net/home-4/rsherma8@jhu.edu/bin/packages/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
paudano commented 5 years ago

I think I see what's going on. Your contig names probably don't have chr, so it's reading them as numbers and pysam doesn't like that.

I just pushed fa21c1b9599a29b2880f31717e6d46dff8ea0124. Try pulling that and see if it helps.

RSherman15 commented 5 years ago

Yes, you're correct, my reference doesn't have chr preceding the numbers. That fix worked, thanks.