mkmh
Make kmers, minimizers, hashes, and MinHash sketches (with multiple k), and compare them.
Usage
To use mkmh functions in your code:
- Include the header file in your code
#include "mkmh.hpp"
- Compile the library:
cd mkmh && make lib
- Make sure the lib and header are on the LD include/lib paths (e.g. in your makefile):
`` gcc -o my_code my_code.cpp -L/path/to/mkmh -I/path/to/mkmh -lmkmh
- That's it!
Available functionality
Convenience functions:
- Reverse complement a string
- Reverse a string
- Capitalize the characters of a string
- Check if a string contains only canonical DNA letters ("A", "a", "C", "c", "T", "t", "G", "g")
Substrings and transforms:
- Get the forward shingles of a string
- Get the kmers size k of a string
- For multiple k, Get the kmers of a string for all k
- Get the (w, k) minimizers of a string
- Calculate the 64-bit hashes of the kmers of a string (with either single or multiple k values)
- Get the MinHash sketch of a string (from either single or multiple k values), using either the top s hashes or the bottom s hashes.
Compare sets of shingles / kmers / minimizers / hashes:
- Take the union of two sets of kmers or hashes.
- Take the intersection of two sets of kmers or hashes.
Fun extras:
- Given a string and a set of query strings, sort the queries in order
of percent similarity.
Getting help
Please reach out through github by posting an issue (even if it's just feedback). Email is acceptable as a secondary medium.