HIP execution fails on large data

lindstro commented 3 years ago

@lindstro Thanks for the response.
I suspect float data type the function should be "frexpf(float,int )" and double it should be "frexp(double,int)".

feature/hip-support branch test case Decompressor Fail : ./zfp -d -1 630000000 -r 4 -x hip -i /Data/SDRBENCH-NWChem-dataset/acd-tst.bin.d64 -z test.zfp (compressor) type=double nx=630000000 ny=1 nz=1 nw=1 raw=5040000000 zfp=315000000(compressor) ratio=16 rate=4 -compressor o/p ./ZFP/build/bin# ./zfp -d -1 630000000 -r 4 -x hip -z test.zfp -o test.d64(decompressor)--Fail- Due to Segmentation fault or runtime issue root cause may suspect in function- void decode_ints( )( zfp/src/hip_zfp/decode.cuh) File Info:

ARRAY DIMENSION: 1D number of elements: name size acd-tst 801098891

In case compressor/decompressor command given parameters are wrong, then behavior expected/response with an error but in here compressor success and decompressor fails.

Originally posted by @anilbommareddy in https://github.com/LLNL/zfp/issues/85#issuecomment-769292954

lindstro commented 3 years ago

I wonder if this issue is related to CUDA bug #121, which was recently fixed. Although I would not expect this to result in a segmentation fault. I first need to download the data and see if we can reproduce the issue on our end.

GarrettDMorrison commented 1 year ago

This appears to be fixed in the staging branch.

lindstro commented 1 year ago

I confirm that the above commands work (on staging) with the same SDRBench data. Moreover, the serial and HIP backends produce the exact same output, both for compression and decompression.

I'm closing this issue. Please re-open if you're still experiencing problems.

LLNL / zfp

HIP execution fails on large data #123