For even moderately sized input files (5k, e.g.) kmerize is taking a long time (hour+) which is way too long. The problem was introduced by the previous fix for the memory issue, and it's in the vectorize.py/make_feature_matrix function, which is using a very slow way of constructing a matrix from individual lists.
For even moderately sized input files (5k, e.g.) kmerize is taking a long time (hour+) which is way too long. The problem was introduced by the previous fix for the memory issue, and it's in the vectorize.py/make_feature_matrix function, which is using a very slow way of constructing a matrix from individual lists.