dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
740 stars 295 forks source link

Property based testing with hypothesis #990

Open luizirber opened 9 years ago

luizirber commented 9 years ago

I've read about a new library called hypothesis which implements property based testing in Python, similar to what is available in other languages (especially QuickCheck/Haskell). Branch feature/hypothesis has two examples for testing our hashing functions and the reverse operation, but it is not so trivial to think about properties to test. This post has some use cases and how to test.

Note I restricted the possibilities, using only 0 < k < 32 and 'ACGT' as alphabet. Removing these restrictions cause many problems, but we need to strike some balance about what is valid or not (or raise the appropriate exceptions when the input is invalid, maybe?)

Code examples

There aren't many examples available, but a search on GitHub lists some options:

https://github.com/URXtech/cmph-cffi/blob/d1cea980177c65bac9c064b3cc8e5fa01a494867/tests/test_cmph.py

https://github.com/tweyter/CityTime/blob/046208435a69eae731ac856be86935211558b879/CityTime_test.py

https://github.com/SethMMorton/fastnumbers/blob/ed269e7d9d2dd1aca24bbcdb21344ce5a2f3d212/tests/test_fastnumbers.py

https://github.com/nedwill/112TermProject/blob/e1719b698c62f002caadd6e3869d7d4a4ad9b6a0/test_tasks.py

mr-c commented 9 years ago

@luizirber you found a bug with this, yes?

luizirber commented 9 years ago

Passing empty strings, strings longer than 32, null-delimited strings or anything with other characters than "ACGT" trigger bugs.

mr-c commented 9 years ago

Great find. We need to define what is valid input to these functions and not fail silently with invalid input.

ctb commented 9 years ago

Reminds me a bit of this: http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext