HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
111 stars 39 forks source link

Compression not handled #10

Closed ahalota closed 7 years ago

ahalota commented 8 years ago

I get the impression that 'gzip' compression is not taking effect. I set the compression when creating the dataset.
My file size has doubled after adding one entry, which would be about 1/20 the size of my entire file.

grpOut.create_dataset("C_PEAT",data=PEAT_emissions, compression="gzip")

This is the same method I used to create my entire file originally.

jreadey commented 8 years ago

What does: h5dump -p -H tell you?

ahalota commented 8 years ago

Here's a subset of the header. 1997 - Created using h5py ; 2015 - Created using h5pyd

HDF5 "GFED_Annual.h5" {
GROUP "/" {
   GROUP "1997" {
      DATASET "C_AGRI" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 720, 1440 ) / ( 720, 1440 ) }
         STORAGE_LAYOUT {
            CHUNKED ( 45, 90 )
            SIZE 714077 (11.616:1 COMPRESSION)
         }
         FILTERS {
            COMPRESSION DEFLATE { LEVEL 4 }
         }
         FILLVALUE {
            FILL_TIME H5D_FILL_TIME_ALLOC
            VALUE  0
         }
         ALLOCATION_TIME {
            H5D_ALLOC_TIME_INCR
         }
         ATTRIBUTE "long_name" {
            DATATYPE  H5T_STRING {
               STRSIZE 48;
               STRPAD H5T_STR_NULLPAD;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
         ATTRIBUTE "units" {
            DATATYPE  H5T_STRING {
               STRSIZE 3;
               STRPAD H5T_STR_NULLPAD;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
      }
   }
   GROUP "2015" {
      DATASET "C_AGRI" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 720, 1440 ) / ( 720, 1440 ) }
         STORAGE_LAYOUT {
            CONTIGUOUS
            SIZE 8294400
            OFFSET 63466808
         }
         FILTERS {
            NONE
         }
         FILLVALUE {
            FILL_TIME H5D_FILL_TIME_IFSET
            VALUE  0
         }
         ALLOCATION_TIME {
            H5D_ALLOC_TIME_LATE
         }
         ATTRIBUTE "long_name" {
            DATATYPE  H5T_STRING {
               STRSIZE 48;
               STRPAD H5T_STR_NULLPAD;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
         ATTRIBUTE "units" {
            DATATYPE  H5T_STRING {
               STRSIZE 3;
               STRPAD H5T_STR_NULLPAD;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
      }
   }
}
jreadey commented 8 years ago

That looks compressed. Can you compare it with a file generated by your original h5py code?

ahalota commented 8 years ago

The first piece, 1997, looks compressed, and is from my h5py code. But 2015 looks different, does that still count as compressed? That was the part I created with h5pyd

jreadey commented 8 years ago

Ok, got it. Seems like a bug, I'll look into it.

jreadey commented 7 years ago

This should be working now (at least gzip, haven't setup test cases for other compression formats).