LLNL / fpzip

Lossless compressor of multidimensional floating-point arrays
http://fpzip.llnl.gov
BSD 3-Clause "New" or "Revised" License
104 stars 16 forks source link

Bounds Checking on fpzip buffer Functions #5

Open william-silversmith opened 2 years ago

william-silversmith commented 2 years ago

Hi Dr. Lindstrom,

I was working on integrating a WASM fpzip decoder into a web viewer (Neuroglancer). However, the maintainer felt it would be best if fpzip performed bounds checking on arrays for the functions. It can cause buffer overflow issues if the number of bytes is not provided.

There may be others, but those are the big ones. It seems like this would be backwards incompatible without adding new functions, so perhaps fpzip_read_from_buffer2 and fpzip_read2 would be better?

Thanks so much for your work. Let me know if you need any updates to https://github.com/seung-lab/fpzip . I could potentially help a PR for this as well.

Will

lindstro commented 1 year ago

@william-silversmith My sincere apologies for not responding to this sooner. I don't recall seeing this issue when you first submitted it, and as you probably know, zfp is taking up most of my time these days, so I don't check here often. In fact, it's been long enough since I worked on fpzip that I had to take some time to figure out what these functions do. :-)

So fpzip_read_from_buffer really does not do any reading but just associates an in-memory buffer (instead of a file) with the input that fpzip_read will later read from. I'm guessing that the num_bytes parameter you suggest would also specify the size of that in-memory buffer, in case the buffer has somehow been corrupted and fpzip would read past the end of it. That's a reasonable concern toward guarding against potential segmentation faults in case the compressed data were corrupted, for instance, but would also carry some performance penalty in having to check against the end of buffer for every byte read. I'd have to dig into the guts of the code to see what the impact might be. I'll note that zfp does not perform such bounds checking, either, for the same performance reasons (though zfp has even more strict performance requirements).

Regarding fpzip_read, I somewhat disagree here since the user already has to set up the FPZ structure to declare the exact size of the data buffer in terms of array dimensions (I'm here assuming that num_bytes specifies the size buffer pointed to by data). If the user gets this wrong, what's to safeguard against them not getting num_bytes wrong, with basically the same consequences? Maybe I'm missing a use case where this would help, but to me this seems a little bit like requiring users of C pointers to not be able to dereference a pointer, *ptr, without also specifying the byte size of memory pointed to, e.g., *(ptr, num_bytes), using some fictitious notation. At some level you have to trust that the user knows what they're doing.