IQTLabs / hypothesis-bio

Hypothesis extension for computational biology
https://lab41.github.io/hypothesis-bio
Apache License 2.0
15 stars 2 forks source link

Trying to improve unit tests by mocking #36

Closed leoisl closed 5 years ago

leoisl commented 5 years ago

Description

Hello. I am interested in this project, but can't attend the hackseq, so I am unsure if I can contribute.

I tried to improve unit tests, and found a test that actually exposes a bug. It is hard to test this code, and although the functions you are implementing can generate a diverse range of objects (e.g. very diverse fasta/protein sequences), the tests are biased towards the minimal generated values (but maybe I misunderstood the tests, sorry if this is the case). Instead of letting hypothesis draw the random elements for us, which is what makes some tests hard I think, we can mock the draw function and fix the element we want to draw, and this might or might not help testing (the mocking can also inspect the variables passed to e.g. hypothesis's text() or draw(), making sure the arguments are correct, e.g. when you want a random dna sequence with no gaps, you call dna(allow_gaps=False), then you can check if the alphabet given to text() indeed has no gaps).

Going more direct to the point: this commit adds two more tests to test_protein.py: https://github.com/leoisl/hypothesis-bio/commit/28e5316f0ade1e5f242c77f2e4387e0fc34c370d .

The first test: https://github.com/leoisl/hypothesis-bio/blob/28e5316f0ade1e5f242c77f2e4387e0fc34c370d/tests/test_protein.py#L60-L70 fixes the drawn protein to ACD and transform it to AlaCysAsp and everything works as expected.

However, the second test: https://github.com/leoisl/hypothesis-bio/blob/28e5316f0ade1e5f242c77f2e4387e0fc34c370d/tests/test_protein.py#L73-L80 fixes the drawn protein to acd (since we can generate lowercase protein sequences), and tries to transform it to the 3-letter format, but fails because utilities.protein_1to3 does not contain lowercase entries.

If you think this could be useful to the other tests (this should be interpreted just as an example that it could), I can make a PR, otherwise I will close this issue.

vaastav commented 5 years ago

I think using Mocking is a great idea! Although the bug that you are talking of is not really a bug because the protein sequence generated will never be lowercase in the current implementation!

So, what I would suggest is creating a PR for adding the ability to generate lowercase protein sequences as well and adding your Mock tests as part of that. I have created a new issue #39 to track this.

PS Even if you are not a part of Hackseq, you can still contribute :)

leoisl commented 5 years ago

Hello,

Cheers! Thanks for the answer! I think the current implementation can generate lowercase protein sequences! It was added in commit https://github.com/Lab41/hypothesis-bio/commit/27f821bd6eba0b03f4dd235a7261c47b5e292cfb

vaastav commented 5 years ago

Aha, completely missed that.

vaastav commented 5 years ago

Yup, a PR would be great then.