Closed MinghaoYan closed 4 years ago
No you did not do anything wrong, but 10KB of data is just way too small for COBS, you can search that amount of data trivially fast enough. In the output you see the parameters COBS chose for your input and it determines the final index size.
Dear Authors,
We are trying to benchmark COBS' memory usage/model size on some fasta files. We followed exactly the commands provided in the tutorial for both the CPP and the Python version and found that the output file from the compact construct command, which can later be used to feed into the query command is extremely large and several times larger than the size of the original data.
For instance, the seven sample files in /tests/data/fasta add up to 10KB. After running the following commands,
src/cobs doc-list tests/data/fasta/ src/cobs compact-construct tests/data/fasta/ example.cobs_compact
the example.cobs_compact file has size 69K.
We looked into the serialization code and saw that "padding" is written to the serialized file.
I am writing to inquire whether I did something wrong in the construction process, what is written into the serialized files, and whether I did something wrong in measuring the size of the model.
Thank you very much!