PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Handle edge-case where all prior allele frequencies are zero #145

Closed timothymillar closed 2 years ago

timothymillar commented 2 years ago

Originally reported by @Cristianetaniguti for mchap call-exact v0.6.0 (thanks!).

Traceback (most recent call last):
  File "/usr/local/bin/mchap", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/mchap/application/cli.py", line 32, in main
    prog.cli(sys.argv).run_stdout()
  File "/usr/local/lib/python3.8/dist-packages/mchap/application/baseclass.py", line 425, in run_stdout
    self._run_stdout_multi_core()
  File "/usr/local/lib/python3.8/dist-packages/mchap/application/baseclass.py", line 410, in _run_stdout_multi_core
    for locus in self.loci():
  File "/usr/local/lib/python3.8/dist-packages/mchap/application/call_baseclass.py", line 17, in loci
    locus = Locus.from_variant_record(
  File "/usr/local/lib/python3.8/dist-packages/mchap/io/loci.py", line 167, in from_variant_record
    frequencies /= frequencies.sum()
RuntimeWarning: invalid value encountered in true_divide
timothymillar commented 2 years ago

This most likely is a result of trying to normalize prior allele frequencies which are all zero. This may happen when using posterior frequencies from mchap assemble if all posterior frequencies are low enough (<0.0005) that they are rounded to zero in the VCF.

This is essentially an invalid prior so the correct approach is probably to skip the MCMC and report unknown alleles for all samples at that locus.

timothymillar commented 2 years ago

Done in #147