Interpreting query output

Colelyman commented 5 years ago

Is there any documentation that explains what the output of the query subcommand means? For example, I have built the bt structure for the human genome and I supply the query subcommand with a path to a file where there are kmers (the same length for which the bt was built) and the

For each kmer I get the following printed to stderr:

    Prematch: 0, Tail Index: 0
    -1
    Total weight: 1, thresh: 0.9, maxkmer: 1

and printed to stdout:

    *AAAAAAAAAAAAAAAAAAAAAAAAAAA 1
    ssbt_genome//Homo_sapiens.GRCh38.dna.primary_assembly.chromo_only.fa.sim.bf.bv.rrr

Any help interpreting this output would be greatly appreciated.

Thank you, Cole

Bradsol commented 5 years ago

stdout is the actual 'output' of the query. The output is of the format (*< KMER> <# of matches>) followed by a line separated list of all the individual hits. This format is consistent with the SBT and should be described in the manual.

The stderr are some debug statements indicating how many kmers were found previously (prematch) and how many kmers have been proven to not exist [in order] (Tail Index) - which version of SSBT are you using?

Colelyman commented 5 years ago

Thanks @Bradsol! I am using the current version on master.

Kingsford-Group / splitsbt

Interpreting query output #11