charlesdaniels / bitshuffle

BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Add chunk-level hash checking #46

Closed jyn514 closed 6 years ago

jyn514 commented 6 years ago

Addresses https://github.com/charlesdaniels/bitshuffle/issues/4. Includes basic smoketests for decode.

jyn514 commented 6 years ago

I can certainly do that on the decoding end. How many packets are we expecting to carry the file checksum? Only the first and last?

charlesdaniels commented 6 years ago

I can certainly do that on the decoding end. How many packets are we expecting to carry the file checksum? Only the first and last?

That seems reasonable to me.

jyn514 commented 6 years ago

Since we won't be updating the error numbers (nor should we), might I suggest numbering by topic? Perhaps argument errors range from 0-99, decoding errors 100-199, input output 200-299, so forth. That would leave plenty of room for additional errors while still making the number itself somewhat informative.

charlesdaniels commented 6 years ago

Since we won't be updating the error numbers (nor should we), might I suggest numbering by topic? Perhaps argument errors range from 0-99, decoding errors 100-199, input output 200-299, so forth. That would leave plenty of room for additional errors while still making the number itself somewhat informative.

I am not opposed to this scheme, but keep in mind that the end user is most likely going to be looking at the error message, not the error code. The point of having error codes is for string matching in test cases, and to provide something unique to search for in the source code.

That said, it’s probably more useful to do something like this than totally random numbers, so I say go for it.

charlesdaniels commented 6 years ago

Everything looks good to merge except that the file hash should probably only be included on the first and last packet to save space. I would implement this in the for c in chunks loop in encode_file. To accomplish this, I would make encode_packet have a flag which causes the file_hash field to be omitted. encode_file can use seqnum as a counter to see if it is on the first or last.

The boolean expression will look like (seqnum == seqmax) or (seqnum ==0 ).