algo-cancer / ImmunoTyper-SR

Genotyping Immunoglobulin Heavy Chain Variable Genes using Short Read Data
Other
8 stars 1 forks source link

python 'ZeroDivisionError: float division by zero' error #7

Closed dduchen closed 1 year ago

dduchen commented 1 year ago

Hello, Thank you for developing the tool! I'm attempting to use it but am getting a division by 0 error early on in the process, after extracting reads: ".../miniconda3-intel/envs/immunotyper-SR/lib/python3.8/site-packages/immunotyper/allele_database.py", line 227, in make_landmarks self.depth_coefficient = 1/(self.read_length/float(self.sampled_depth)) ZeroDivisionError: float division by zero"

Any recommendations? I'd be happy to provide a smallish BAM file to reproduce the error if that would be helpful.

michael-ford commented 1 year ago

Hey there, could you share what specific genome reference version your WGS BAM is mapped to? Are you using the whole WGS BAM or a subset of it (i.e. just chr 14)? If you could share the stdout and stderr output when you run the tool that would be the most helpful. I suspect the problem is it is not calculating the read depth correctly - if so I'll need to add a more informative error message.

dduchen commented 1 year ago

Hello, Thank you that makes sense - while I've performed mapping/alignment using the GCA_000001405.15_GRCh38_no_alt_analysis_set reference, the sequencing data was amplified directly from immunoglobulin locus, so it's a subset of chr14. Are there a set of parameters/options I could use to avoid this error?

Adding the stdout ouput below:

Loading reads from ./SRR14814123.bam...
Contig chr14_KI270846v1_alt not present in alignment file
Recruiting reads from chr14...
Recruiting reads from chr14_KI270726v1_random...
Recruiting reads from chr16...
Recruiting reads from chr16_KI270728v1_random...
Contig chrUn_KN707869v1_decoy not present in alignment file
Contig chrUn_KN707869v1_decoy not present in alignment file
Recruiting reads from chr15...
Contig chrUn_JTFH01000119v1_decoy not present in alignment file
Recruiting reads from chr21...
Contig chr15_KI270851v1_alt not present in alignment file
Recruiting reads from chr2...
Contig chrUn_KN707925v1_decoy not present in alignment file
Contig chrUn_KN707925v1_decoy not present in alignment file
Contig chrUn_JTFH01000620v1_decoy not present in alignment file
Extracting mapped alignments using 
 samtools view -h ./SRR14814123.bam chr14:105586000-106881000 chr14_KI270726v1_random:3018-43878 chr16:31950180-34015686 chr16_KI270728v1_random:1107945-1315145 chr15:19962667-22362911 chr21:10649150-10649945 chr2:94952908-94953700  | samtools fasta - | sed 's,/,-,' > immunotyper_SRR14814123/SRR14814123-extracted.fa
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 1223507 reads
Recruited 1223507 reads
Extracting unmapped alignments using 
 samtools fasta -f 0x4 ./SRR14814123.bam | sed -E s,/(.),-\1-un, >> immunotyper_SRR14814123/SRR14814123-extracted.fa
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 200955 reads
Recruited 200955 reads
Sampled read length: 300
Making allele landmark positions (n=36, groups=6)
Traceback (most recent call last):
  File "/Users/dylanduchen/miniconda3-intel/envs/immunotyper-SR/bin/immunotyper-SR", line 8, in <module>
    sys.exit(main())
  File "/Users/dylanduchen/miniconda3-intel/envs/immunotyper-SR/lib/python3.8/site-packages/immunotyper/__main__.py", line 107, in main
    run_immunotyper(args.bam_path, args.ref, args.gene_type, args.hg37, args.output_dir, args.landmark_groups, args.landmarks_per_group, args.max_copy, args.stdev_coeff, args.seq_error_rate, args.write_cache_path, args.solver_time_limit, args.threads)
  File "/Users/dylanduchen/miniconda3-intel/envs/immunotyper-SR/lib/python3.8/site-packages/immunotyper/__main__.py", line 147, in run_immunotyper
    allele_db.make_landmarks(landmark_groups*landmarks_per_group, READ_LENGTH, READ_DEPTH, VARIANCE, EDGE_VARIANCE, 50, landmark_groups)
  File "/Users/dylanduchen/miniconda3-intel/envs/immunotyper-SR/lib/python3.8/site-packages/immunotyper/allele_database.py", line 227, in make_landmarks
    self.depth_coefficient = 1/(self.read_length/float(self.sampled_depth))
ZeroDivisionError: float division by zero
michael-ford commented 1 year ago

Thats a really interesting sounding dataset! So immunotyper samples the read depth and variance from a region on Chr 2, which is obviously not present in your dataset, which is why you're getting the error. Hang tight I'll push an update that allows inputting those features as arguments.

We haven't tested immunotyper on any capture-based sequencing data yet, although its definitely something that I would like to support. However our reliance on semi-stable read depth across the IGHV genes may present a problem, if the read depth is significantly more variable than with standard illumina WGS. Just something to consider at this stage, happy to work with you if there are problems.