2D Point data curiosly compresses way better at 1D

LLNL / zfp

Compressed numerical arrays that support high-speed random access

http://zfp.llnl.gov

BSD 3-Clause "New" or "Revised" License

754 stars 152 forks source link

2D Point data curiosly compresses way better at 1D #225

Closed b100dian closed 5 months ago

b100dian commented 5 months ago

Hello, And thank you for this project!

I'm trying to understand if I'm using this in a wrong way:

I have a single float array of 432 f32 numbers (attached) which represents 2D data points. The outcome of a 1D compression is:

$ zfp -f -1 432 -R -s -i Floats2.data -s
type=float nx=432 ny=1 nz=1 nw=1 raw=1728 zfp=1624 ratio=1.06 rate=30.07 rmse=0 nrmse=0 maxe=0 psnr=inf

But the 2d is WAY larger:

$ zfp -f -2 2 216 -R -s -i Floats2.data
type=float nx=2 ny=216 nz=1 nw=1 raw=1728 zfp=2480 ratio=0.697 rate=45.93 rmse=0 nrmse=0 maxe=0 psnr=inf

(I did try -2 216 2 too for worse results).

What I don't understand is that, is there any padding done, or why would the 2D compression be so much worse.

Floats2.data.zip

Thanks!

lindstro commented 5 months ago

From briefly looking at the data, it appears to be a list of pairs of numbers (2-vectors). Here are the first 48 values:

This suggests that the domain of this field is one-dimensional while the range is two-dimensional. zfp compresses only scalar fields, so you need to separate the data into two different arrays and compress them independently. This is also one of the zfp FAQs.

After deinterleaving the two scalar fields, we obtain this:

This reorganized data set compresses losslessly to 1480 bytes using -1 432. But taking a closer look, it seems that the data now consists of triplets of related values. Compressing it using -2 3 144, the compressed size is further reduced to 1320 bytes.

I suspect zfp could do even better here if the data were organized better. Do you know what the data represents and what its intrinsic dimensionality is?

b100dian commented 5 months ago

Thanks for the fast reply - sorry it was eod for me. It was not immediately apparent to me that "zfp compresses only scalar fields, so you need to separate the data into two different arrays" and I thought this would be just a 2nd dimension of a 2d array.

Do you know what the data represents and what its intrinsic dimensionality is?

The data represents X and Y float coordinates for cubic bezier curves. Even though a such curve is described by 4 points, the reason there are only three points is because the curves are re-using the previous point of the last one such as this web API bezierCurveTo.

So this would explain why you're seeing: 1). Two dimensions (X and Y) and 2) Triplets of data (the three points for each cubic curve)

lindstro commented 5 months ago

OK. I guess I don't understand why the Bezier control points are essentially constant for each curve segment but there are obvious jumps between "consecutive" triplets. Anyway, there's potentially some massaging of the data (e.g., reordering) that can be done to improve compression.

Can I go ahead and close this issue?

b100dian commented 5 months ago

Yes, in this particular example the control points are like non-existent, meaning there are no curves.

Thank you for your help and pointing out the FAQ entry I missed. Closing this for now