Closed alexlancaster closed 7 years ago
@sjmack: the problem had a common origin: generating the genotype output table, the fix I just pushed to master
(i.e. just git pull
and ./setup.py build
again) should fix both the common genotype as well as the output in the Guo & Thompson. I added one of your files (the non-dash one) to the test suite (can you re-run py.test -s -v
?). Is it worth also adding the dash version as well?
In any case, good catch! more test data files like this will improve our overall test coverage, and catch more issues like this. I would encourage @kosoegawa and others to please open up issues and add files like this.
The dash version is only useful for testing against older versions of PyPop, but that is necessary for validation, so ....
I'll run some tests!
I'll add to the test suite as well then, since it should handle dashes as well as colons. Right now, we would have problems only if you used the genotype separator (which I've hardcoded as a tilde ~
) within the allele identifier. Eventually even this should be removed (see #14) so there would be no "special" character that you would have to avoid, but this would be a more major internal architectural change.
@sjmack is this issue fixed from your POV? if so, I'll close.
The missing common genotypes and incorrect stats issues are resolved; however Issue #19 discusses an additional issue with the common genotypes (which I should have noted using the *_dash.pop version of the data, but didn't).
py.test -s -v
also shows some fails as py.test did earlier.
:pypop sjmack$ py.test -s -v
================================================================= test session starts ==================================================================
platform darwin -- Python 2.7.13, pytest-3.1.2, py-1.4.34, pluggy-0.4.0 -- /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
cachedir: .cache
rootdir: /Applications/PyPop/pypop, inifile:
collected 5 items
tests/test_AlleleColon.py::test_AlleleColon_HardyWeinberg PASSED
tests/test_AlleleColon.py::test_AlleleColon_Emhaplofreq PASSED
tests/test_GenotypeCommon.py::test_GenotypeCommon_HardyWeinberg FAILED
tests/test_GenotypeCommon.py::test_GenotypeCommonDash_HardyWeinberg FAILED
tests/test_gthwe.py::test_gthwe SKIPPED
======================================================================= FAILURES =======================================================================
__________________________________________________________ test_GenotypeCommon_HardyWeinberg ___________________________________________________________
def test_GenotypeCommon_HardyWeinberg():
exit_code = base.run_pypop_process('./tests/data/WS_BDCtrl_Test_HW.ini', './tests/data/BIGDAWG_SynthControl_Data.pop')
# check exit code
assert exit_code == 0
# compare with md5sum of output file
> assert hashlib.md5(open("BIGDAWG_SynthControl_Data-out.txt", 'rb').read()).hexdigest() == '276263b0d0d9fc03b77826388d70510d'
E AssertionError: assert 'c0d952cbc16e...3c849ea4e2095' == '276263b0d0d9f...826388d70510d'
E - c0d952cbc16e90f5bd03c849ea4e2095
E + 276263b0d0d9fc03b77826388d70510d
tests/test_GenotypeCommon.py:11: AssertionError
________________________________________________________ test_GenotypeCommonDash_HardyWeinberg _________________________________________________________
def test_GenotypeCommonDash_HardyWeinberg():
exit_code = base.run_pypop_process('./tests/data/WS_BDCtrl_Test_HW.ini', './tests/data/BIGDAWG_SynthControl_Data_dash.pop')
# check exit code
assert exit_code == 0
# compare with md5sum of output file
> assert hashlib.md5(open("BIGDAWG_SynthControl_Data_dash-out.txt", 'rb').read()).hexdigest() == 'b0f4247a2a67a65d0109b6448427cc28'
E AssertionError: assert '93ff300c62a3...bd001ce8c592b' == 'b0f4247a2a67a...9b6448427cc28'
E - 93ff300c62a3056c4cdbd001ce8c592b
E + b0f4247a2a67a65d0109b6448427cc28
tests/test_GenotypeCommon.py:18: AssertionError
=================================================== 2 failed, 2 passed, 1 skipped in 157.40 seconds ====================================================
I'll respin the tests as per my comment in #4 and if that works, I'll close this particular issue.
Hi @sjmack, let me know if the tests look OK and I'll close.
py.test: 11 passed, 1 skipped in 75.75 seconds.
Looks good.
Transferring from issue #4 comment https://github.com/alexlancaster/pypop/issues/4#issuecomment-313516210 originally by @sjmack:
However, I have constructed a test data file (the controls from the BIGDAWG synthetic datafile) that reveals several issues with the current HW implementations (vs version 0.7.0).
I'm attaching two versions of this test file: BIGDAWG_SynthControl_Data.pop.txt and BIGDAWG_SynthControl_Data_dash.pop.txt And the associated .ini file: WS_BDCtrl_Test_HW.ini.txt Be sure to remove the .txt suffices.
The difference between the two datasets is that the _dash.pop file has the colons converted to dashes. I did this so that I could compare the current developmental version of PyPop on my Mac to v0.7.0 running on my PC. I could only run the _dash.pop file on my PC.
I have three set of results. The git. versions were generated using this development version of PyPop, and the 070. versions with the current release version.
First of all, you will notice that there are no Common Genotypes being generated with the development version. The results below show the dash datasets, but the same happens when colons are included for the developmental version.
Compare (Git):
With (v0.7.0):
In addition, the stats being reported for the developmental version include errors; especially for the Chen and Diff tests, where obs and exp values are 0. The differences in the p-values for the mcmc results probably stem from the Markov-Chain, but I only did each one once, so I'm not certain.
Compare (Git):
to (version 0.7.0):
Here are the results:
BIGDAWG_SynthControl_Data-out.git.txt BIGDAWG_SynthControl_Data-out.git.xml.txt BIGDAWG_SynthControl_Data_dash-out.git.txt BIGDAWG_SynthControl_Data_dash-out.git.xml.txt BIGDAWG_SynthControl_Data_dash-out.070.txt BIGDAWG_SynthControl_Data_dash-out.070.xml.txt
Again, remove the .txt from the .XML filenames.