HDFGroup / hdf5

Official HDF5® Library Repository
https://www.hdfgroup.org/
Other
631 stars 256 forks source link

Stack overflow when parsing corrupt file #4111

Open tbeu opened 8 months ago

tbeu commented 8 months ago

I have a custom application (similar as h5dump) where I - recursively - run over all group objects and datasets in a HDF5 file and retrieve their data. Unfortunately it leads to a stack overflow in that recursion for a corrupted HDF5 file. In the API log you can see that starting with line 127 the same data set is dereferenced and recursed again and again. I skipped all other recursions but the first three (lines 127, 147 and 167).

Obviously I am currently not able to detect the file corruption since all return codes signal success and nothing is pushed to the error stack.

I also checked h5dump which is smarter and reports errors.

I wonder how I can do better (despite introducing some artificial recursion limit) and detect the file corruption. Any support is appreciated. Thanks a lot!

Platform (please complete the following information)

derobins commented 8 months ago

Can you provide the file? We also have a set of file format parsing changes that should arrive before 1.14.4.

tbeu commented 8 months ago

Sure, here it is: so.zip. The dataset in question is /cells_with_structs.

Thanks for looking into this. Any pointer how clients like h5dump should treat such a file are appreciated.

derobins commented 8 months ago

In our develop branch, I don't see a stack overflow. I just get a normal tools error, which is expected if you are parsing a broken file.

Do you see the stack overflow when you build with our develop branch? That code will become 1.14.4 in a week or so. Thanks!

tbeu commented 8 months ago

Thanks for replying. Yes, still reproducable and likely is a caller issue. I attached all the driving code and Win debug binaries in T4111.ZIP

Run from a Win cmd

set HDF5_DEBUG="1 trace"
# leads to stack-overflow which I am unable to mitigate
test_T4111.exe clusterfuzz-testcase-minimized-matio_fuzzer-5700246703046656 1>test.log
# raises a debug assertion (in case of enabled API tracing)
h5dump.exe clusterfuzz-testcase-minimized-matio_fuzzer-5700246703046656 1>dump.log
tbeu commented 7 months ago

Any ideas how to improve? Thanks.

tbeu commented 4 months ago

Still reproducible with 9fd4fd0b586fc99fea6eafd9038b22bcb73bb9c3