Closed rchikhi closed 3 years ago
Furthermore, my kff file looks like:
$ hexdump 0-0.kff |head -n 10
0000000 0001 001e 0000 7600 0003 0000 0000 0000
0000010 006b 001f 0000 0000 0000 616d 0078 ca00
0000020 3b9a 0000 0000 6164 6174 735f 7a69 0065
0000030 0004 0000 0000 0000 d072 3ef7 0100 0000
0000040 0000 0000 0000 0000 **2900 0001** 0100 0000
0000050 0000 0000 0000 0000 9301 0001 0100 0000
I added **'s to highlight where the first count (0x129) supposed to be encoded, yet, it is printed as 0x29
btw my global variables are
Section_GV sgv(_outfile);
sgv.write_var("k", _kmerSize);
sgv.write_var("max", 1000000000L); // ¯\_(ツ)_/¯
sgv.write_var("data_size", 4); // DSK counts are stored as uint32_t
I think that the problem is a due to outstr. There is a problem in data reading an outputing.
There is also another problem with undefined endianess. I am still askink myself on how to fix it. Imposing one of the two endianess or add it as a variable in the global variable section.
Maybe an optimization for your code: you set max to a huge number. It means that all the block can have that huge number kmer each. It also imply that the integer that is needed to store your kmer number in a block have to be 4 Bytes long (for each block). As far as I know, you store one kmer per block for now. So, you should write max=1. It will save the 4 bytes for the block size for each block.
The error of outstr should be fixed. It was a hardcoded variable value problem.
Can you please show a small working example for storing large count values ? I tried the following
and yet,
outstr
only shows counts <= 255.