Closed jjzieve closed 5 years ago
@KelleyRyanM ignore my previous comment, I see some of the indel logic in GenotypeFormat, I think I can apply that
@jjzieve , Have you or @KyleLevi verified this pull request is still compatible with Python 3? -Ryan
@KelleyRyanM Nope, I can check python3 compatibility and address the other comments
@KelleyRyanM Ran regressions on python 3.5.6 and it worked. Seemed that was the highest version that pysam 0.9.0 had been built on.
@KelleyRyanM Kyle had failed regressions due to precision differences for GQ and LRR with an updated numpy module. I've confirmed with numpy=1.15.2 (I believe he has 1.16.3).
i.e.
regression ->0/0:0.437028:0.0:0.167821
updated numpy -> 0/0:0.43702784:0.0:0.16782084
Is this acceptable since the README points to one version (1.11)? If not, what's the best way to handle this? This SO post seems to outline the problem: https://stackoverflow.com/questions/4844865/how-to-set-the-precision-on-strnumpy-float64 I'm guessing numpy updated the default precision from 6 to 8 or something somewhere between 1.11 and 1.15
@jjzieve , Can't we just format the string when returning from the BAlleleFreq formatter (e.g., return value -> return %.6f' % value)?
@KelleyRyanM yes, but its impacting gencall score also, it looks like it can't easily be fixed without re-generating the truth vcfs for the regressions. Also it appears round(value, 6) might be better than format since it doesn't pad 0.0 to 0.000000 (though maybe we want that?). But I can re-gen the vcfs and see how many values change. Most of the gencalls are 6 decimal places but I see some that are 5, 7, etc.
@KelleyRyanM and @KyleLevi it impacts 35 loci. Maybe I can move that effort to a separate PR/issue? There could be some usefulness in providing the option to specify the precision of the output? But I think that's a bit out of scope for this PR.
Spoke to Ryan, opening a seperate issue for the precision differences with numpy updates.
This is a rebase from develop and some tweaks to @brianhill11's PR (#20). Tweaks include: