dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
756 stars 295 forks source link

Lack of explicit testing for count function #878

Open camillescott opened 9 years ago

camillescott commented 9 years ago

While count(kmer) is used extensively throughout the test suite, there are no explicit tests for the functionality of count itself on the counting hash or hashbits objects.

@ctb @mr-c

mr-c commented 9 years ago

Good find, thanks!

According to http://ci.ged.msu.edu/job/khmer-master/label=linux/227/cobertura/lib/counting_hh/ there is coverage via other functions of count() so that's something at least.

ctb commented 9 years ago

On Mon, Mar 23, 2015 at 09:37:42AM -0700, Michael R. Crusoe wrote:

Good find, thanks!

According to http://ci.ged.msu.edu/job/khmer-master/label=linux/227/cobertura/lib/counting_hh/ there is coverage via other functions of count() so that's something at least.

yes, this should be easy to fix by copying some of the 'consume' functions from test_counting_hash.py and having them use 'count' instead.

caitsydney commented 8 years ago

Was test_counting_hash.py renamed or moved somewhere else? Can't find it.

standage commented 8 years ago

This is now tests/test_countgraph.py I believe.

caitsydney commented 8 years ago

Cool. So I am working on this and have questions. :) I copied test_get_raw_tables_view and test_3_tables replaced 'consume' with 'count.' They passed.

But I tried the same with test_find_spectral_error_positions_4 and get 'ValueError: k-mer length must equal the k-mer size' - why?

Are there tests that make more sense to use count for?

ctb commented 8 years ago

Good question! consume is used for strings longer than k, which are decomposed into strings of length k which are then counted - this is a core functionality of khmer. count is the Python function that counts a single k-mer, so you have to give it a string of exactly length k.

An easy (and good) approach would be to duplicate the functionality of consume by using a for loop to extract all substrings of length k and count them - individually - using the count function. It's a good test :)

betatim commented 8 years ago

@caitsydney are you still working on this? (I am looking for a simple issue to work on to get into khmer)

ctb commented 7 years ago

This could be usefully done in #1511 as part of the expansion into core counting functionality vs derived classes, hint to self.