ComputationalRadiationPhysics / libSplash

libSplash - Simple Parallel file output Library for Accumulating Simulation data using Hdf5
GNU Lesser General Public License v3.0
15 stars 15 forks source link

Simple Dataspaces for Attributes #170

Closed ax3l closed 9 years ago

ax3l commented 9 years ago

@f-schmitt-zih Is it currently possible to write an attribute that consists of a fixed number of base types, e.g., 7 doubles? (I am not sure libSplash already supports that, basic HDF5, h5py and ADIOS 1.9+ does.)

This is necessary for initial openPMD support, so users can flavor their output accordingly (see unitDimension attribute).

Python example with h5py:

record.attrs["unitDimension"] = \
    np.array([-3.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0 ], dtype="float64")
    #           L    M    T    I  theta  N    J
    # C / m^3 = A * s / m^3 -> M^-3 * T * I
psychocoderHPC commented 9 years ago

I think at the moment it is not possible, but if you add Double7 here it is.

psychocoderHPC commented 9 years ago

If you need a real array and no compound type you can implement a new type like the old splash bool implementation https://github.com/ComputationalRadiationPhysics/libSplash/blob/e64a7aafadf6143b0ce3a2f5427707a1cb4a2e15/src/include/splash/basetypes/ColTypeBool.hpp There is a native array implementation here if you use the makro TYPE_ARRAY in the user code it is possible without any changes in splash.

ax3l commented 9 years ago

good idea, let me write a test to see if it's compatible with h5py

psychocoderHPC commented 9 years ago

I updated my last comment!

psychocoderHPC commented 9 years ago

example how we used it in PIConGPU https://github.com/ComputationalRadiationPhysics/picongpu/blob/master/src/picongpu/include/plugins/hdf5/WriteSpecies.hpp#L58

ax3l commented 9 years ago

I saw your update, I am talking about TYPE_ARRAY ;)

ax3l commented 9 years ago

Hm, I expected that. there are too many ways to do that in HDF5.

For arrays, example diff:

diff --git a/tests/AttributesTest.cpp b/tests/AttributesTest.cpp
index adc3887..a5e0ae2 100644
--- a/tests/AttributesTest.cpp
+++ b/tests/AttributesTest.cpp
@@ -101,8 +101,10 @@ void AttributesTest::testDataAttributes()
     dataCollector->writeAttribute(0, ctInt, "datasets/my_dataset", "neg_sum", &neg_sum);

     char c = 'Y';
+    double d[7] = {-3.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0};
     dataCollector->writeAttribute(0, ctInt, "datasets", "sum_at_group", &sum);
     dataCollector->writeAttribute(0, ctChar, "datasets", "my_char", &c);
+    dataCollector->writeAttribute(0, ctDouble7, "datasets", "unitDimension", d);

     delete[] dummy_data;
     dummy_data = NULL;
@@ -153,11 +155,15 @@ void AttributesTest::testDataAttributes()
     CPPUNIT_ASSERT(sum == old_sum);
     CPPUNIT_ASSERT(neg_sum == -old_sum);

+    double dr[7] = {0., 0., 0., 0., 0., 0., 0.};
     dataCollector->readAttribute(0, "datasets", "sum_at_group", &sum);
     dataCollector->readAttribute(0, "datasets", "my_char", &c);
+    dataCollector->readAttribute(0, "datasets", "unitDimension", dr);

     CPPUNIT_ASSERT(sum == old_sum);
     CPPUNIT_ASSERT(c == 'Y');
+    for (int i = 0; i < 7; i++)
+        CPPUNIT_ASSERT(dr[i] == d[i]);

     dataCollector->close();
 }
diff --git a/tests/include/AttributesTest.h b/tests/include/AttributesTest.h
index 65fd676..ef65d7a 100644
--- a/tests/include/AttributesTest.h
+++ b/tests/include/AttributesTest.h
@@ -30,6 +30,8 @@

 using namespace splash;

+TYPE_ARRAY(MyDouble7, H5T_NATIVE_DOUBLE, double, 7);
+
 class AttributesTest  : public CPPUNIT_NS::TestFixture
 {
     CPPUNIT_TEST_SUITE(AttributesTest);
@@ -52,6 +54,7 @@ private:
     ColTypeDimArray ctDimArray;
     ColTypeString ctString;
     ColTypeString ctString4;
+    ColTypeMyDouble7Array ctDouble7;
     DataCollector *dataCollector;
 };

diff --git a/tests/readBoolChar.py b/tests/readBoolChar.py
index 2ff24cb..3c65022 100755
--- a/tests/readBoolChar.py
+++ b/tests/readBoolChar.py
@@ -24,7 +24,7 @@
 import h5py
 import numpy as np

-# bool compatible data sets
+# bool compatible data sets ###################################################
 f = h5py.File("h5/testWriteRead_0_0_0.h5", "r")
 data = f["data/10/deep/folders/data_bool"]

@@ -38,8 +38,14 @@ for i in np.arange(len):

 f.close()

-# single char compatible attributes
+# compatible attributes #######################################################
 f = h5py.File("h5/attributes_0_0_0.h5", "r")
+
+# array attributes
+d = f["data/0/datasets"].attrs["unitDimension"]
+print(d, type(d), d.dtype)
+
+# single char compatible attributes
 c = f["data/0/datasets"].attrs["my_char"]

 # h5py, as of 2.5.0, does not know char and

libSplash creates (h5dump -H)

            ATTRIBUTE "unitDimension" {
               DATATYPE  H5T_ARRAY { [7] H5T_IEEE_F64LE }
               DATASPACE  SCALAR
            }

which can not be read via h5py

Traceback (most recent call last):
    d = f["data/0/datasets"].attrs["unitDimension"]
  File "/usr/lib/python2.7/dist-packages/h5py/_hl/attrs.py", line 58, in __getitem__
    attr.read(arr)
  File "h5a.pyx", line 350, in h5py.h5a.AttrID.read (h5py/h5a.c:4626)
TypeError: Numpy array rank 1 must match dataspace rank 0.

but it should actually be an

                  ATTRIBUTE "unitDimension" {
                     DATATYPE  H5T_IEEE_F64LE
                     DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
                  }

interesting: that is the difference between both in hdfview: arrayattribsplash

ax3l commented 9 years ago

trying again with TYPE_COMPOUND...

writing creates an

HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 0:
  #000: ../../../src/H5Tcompound.c line 368 in H5Tinsert(): no member name
    major: Invalid arguments to routine
    minor: Bad value
AttributesTest::testDataAttributes : OK
AttributesTest::testArrayTypes : OK
OK (2)

-> fixed, just added a new identifier here

and reading via h5py an

(-3.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0)
 <type 'numpy.void'>
  dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('u', '<f8'), ('v', '<f8'), ('w', '<f8'), ('i', '<f8')])

which is still not the result that I was looking for.

h5dump:

               DATATYPE  H5T_COMPOUND {
                  H5T_IEEE_F64LE "x";
                  H5T_IEEE_F64LE "y";
                  H5T_IEEE_F64LE "z";
                  H5T_IEEE_F64LE "u";
                  H5T_IEEE_F64LE "v";
                  H5T_IEEE_F64LE "w";
                  H5T_IEEE_F64LE "i";
               }
               DATASPACE  SCALAR
            }

hdfview: splashcompound

ax3l commented 9 years ago

What we should actually use for attributes that are n-times-same-type are HDF5 Simple Dataspaces (as we use them for data sets):

A simple dataspace, H5S_SIMPLE, is a multidimensional array of elements.
The dimensionality of the dataspace (or the rank of the array) is fixed and is
defined at creation time. The size of each dimension can grow during the life
time of the dataspace from the current size up to the maximum size. Both the
current size and the maximum size are specified at creation time. The sizes of
dimensions at any particular time in the life of a dataspace are called the current
dimensions, or the dataspace extent. They can be queried along with the maximum sizes.

but we can also add a TYPE_SIMPLE and a basetypes_simple.hpp and leave the old TYPE_ARRAY for other purposes (nevertheless, I am not sure it can be read with h5py - we should check our particle_info data set, maybe it's only a problem with attributes)

ax3l commented 9 years ago

I am implementing TYPE_SIMPLE already... cool, they even allow multi-dimensional simple dataspaces ^^ we currently use them directly in DCDataSet.cpp.

actually, that are dataspaces and they can be used with every TYPE.

so what I am looking for is non-scalar dataspace support in attributes (not only in data sets).

ax3l commented 9 years ago

Working quick & dirty preview: https://github.com/ax3l/libSplash/commit/b64d3b50077416ca338aa74deeb927a1e3834c0b

double d[7] = {-3.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0};
dataCollector->writeAttribute(0, ctDouble, "datasets", "unitDimension", d, 1u, Dimensions(7,0,0));
// ...
double dr[7] = {0., 0., 0., 0., 0., 0., 0.};
dataCollector->readAttribute(0, "datasets", "unitDimension", dr);

h5dump -H h5/attributes_0_0_0.h5

            ATTRIBUTE "unitDimension" {
               DATATYPE  H5T_IEEE_F64LE
               DATASPACE  SIMPLE { ( 7 ) / ( 7 ) }
            }

note: the second "7" is the maximum extend (fixed, not resizable)

psychocoderHPC commented 9 years ago

H5view has some bugs with compount types and might be with arrays, so don't thrust h5view!

ax3l commented 9 years ago

H5view has some bugs with compount types and might be with arrays, so don't thrust h5view!

that's why I posted the trustworthy h5dump outputs and the corresponding representations in h5py ;)

nevertheless, the HDFview output was totally correct for the examples I showed. but I was looking for an other representation (simple dataspaces, not compound or array types)

ax3l commented 9 years ago

closed with #171