Random sequence generation is something I've used many times in various projects. I thought adding it to btllib would be beneficial for future work. This PR adds an API for generating random DNA/RNA/protein sequences and a recipe to generate random datasets.
Key features added:
btllib::RandomSequenceGenerator class with nucleotide/amino acid sequence generation support. Also supports soft/hard masking options.
seqgen recipe to efficiently generate random datasets using btllib::RandomSequenceGenerator. Note that this does not support protein sequences for now as SeqWriter doesn't have a module to support it yet.
argparse subproject to parse command-line arguments with a more intuitive API. Recipes that use argparse should be compiled with cpp_std=c++17.
Random sequence generation is something I've used many times in various projects. I thought adding it to btllib would be beneficial for future work. This PR adds an API for generating random DNA/RNA/protein sequences and a recipe to generate random datasets.
Key features added:
btllib::RandomSequenceGenerator
class with nucleotide/amino acid sequence generation support. Also supports soft/hard masking options.seqgen
recipe to efficiently generate random datasets usingbtllib::RandomSequenceGenerator
. Note that this does not support protein sequences for now asSeqWriter
doesn't have a module to support it yet.argparse
subproject to parse command-line arguments with a more intuitive API. Recipes that useargparse
should be compiled withcpp_std=c++17
.seqgen
.