dkoslicki / MetaPalette

Metagenomic profiling and phylogenetic distances via common kmers
Other
42 stars 5 forks source link

Traverse training data once for many input samples #3

Open dkoslicki opened 8 years ago

dkoslicki commented 8 years ago

It would be possible to traverse the training database once (in parallel) for many input samples. Currently, samples are classified one-by-one, and the training database is traversed each time.

Could probably accomplish this by modifying query_per_sequence.cc (the smart/tedious way: by allowing multiple .jf inputs, put in a heap, keep track as in count_in_file.cc) or loop over samples in the function form_y in Classify.py (dumb/easy way).