ComputationalRadiationPhysics / libSplash

libSplash - Simple Parallel file output Library for Accumulating Simulation data using Hdf5
GNU Lesser General Public License v3.0
15 stars 15 forks source link

Attribute meta info #248

Closed Flamefire closed 7 years ago

Flamefire commented 8 years ago

This adds methods for getting the attribute meta info. Possible usage would be:

template<typename T>
void ReadAttribute(DataCollector& dc, T& attribute){
    AttributeInfo info = dc.readAttributeInfo(42, NULL, "fooAttr");
    // Check dimensionality
    assert(info.IsScalar());

    // Assume a trait GetCollectionType that returns the collection type from a
    // given type such as int->ColTypeInt

    // Deprecated functionality for fast transition (potentially unsafe)
    assert(typeid(info.getType()) == typeid(typename GetCollectionType<T>::type));
    assert(sizeof(attribute) == info.getMemSize());
    info.read(&attribute, sizeof(attribute));

    // OR:
    // Cast data to given type. Note: This is the type of each data point. Attribute could still contain
    // an nD array of data. Each of those points may be converted and will be stored in attribute
    // so check dataspace first
    info.read(GetCollectionType<T>::type(), &attribute);
}

One-Liner:

dc.readAttributeInfo(42, NULL, "fooAttr").read(&attribute, sizeof(attribute))

Note: This might introduce unnecessary overhead. A better solution would be returning a C++ attribute class that provides basically the same interface as the AttributeInfo class but stores the HDF identifiers and loads the data on access (most data is only required once, and accessing it in HDF5 already is a copy, so one saves an additional copy) It could also provide a generic read method for getting the data. (Edit: New version does that) However HDF5 already provides a C++ Interface. Not sure if libSplash is still of use...

Flamefire commented 8 years ago

This now also adds read methods and delays reading the meta-data till it is actually used which together greatly improves speed. This fixes #238, #244 and helps #245 (passing a raw buffer which uses the attributes stored type might still mess up endianess)

ccing @slizzered

ax3l commented 8 years ago

dependency merged, pls rebase :)

Flamefire commented 8 years ago

New version done and tested.

Notes:

ax3l commented 8 years ago

Current internal classes using a then deprecated read API:

src/DomainCollector.cpp: In member function ‘virtual splash::Domain splash::DomainCollector::getLocalDomain(int32_t, const char*)’:
src/DomainCollector.cpp:85:90: warning: ‘virtual void splash::SerialDataCollector::readAttribute(int32_t, const char*, const char*, void*, splash::Dimensions*)’ is deprecated (declared at src/include/splash/SerialDataCollector.hpp:327) [-Wdeprecated-declarations]
         readAttribute(id, name, DOMCOL_ATTR_SIZE, domain_size.getPointer(), &mpi_position);
                                                                                          ^
src/DomainCollector.cpp:86:94: warning: ‘virtual void splash::SerialDataCollector::readAttribute(int32_t, const char*, const char*, void*, splash::Dimensions*)’ is deprecated (declared at src/include/splash/SerialDataCollector.hpp:327) [-Wdeprecated-declarations]
         readAttribute(id, name, DOMCOL_ATTR_OFFSET, domain_offset.getPointer(), &mpi_position);
                                                                                              ^
src/DomainCollector.cpp: In member function ‘void splash::DomainCollector::readGlobalSizeFallback(int32_t, const char*, hsize_t*, splash::Dimensions*)’:
src/DomainCollector.cpp:795:83: warning: ‘virtual void splash::SerialDataCollector::readAttribute(int32_t, const char*, const char*, void*, splash::Dimensions*)’ is deprecated (declared at src/include/splash/SerialDataCollector.hpp:327) [-Wdeprecated-declarations]
             readAttribute(id, dataName, DOMCOL_ATTR_GLOBAL_SIZE, data, mpiPosition);
                                                                                   ^
src/DomainCollector.cpp:799:82: warning: ‘virtual void splash::SerialDataCollector::readAttribute(int32_t, const char*, const char*, void*, splash::Dimensions*)’ is deprecated (declared at src/include/splash/SerialDataCollector.hpp:327) [-Wdeprecated-declarations]
             readAttribute(id, dataName, DOMCOL_ATTR_SIZE, local_size, mpiPosition);
                                                                                  ^
src/DomainCollector.cpp: In member function ‘void splash::DomainCollector::readGlobalOffsetFallback(int32_t, const char*, hsize_t*, splash::Dimensions*)’:
src/DomainCollector.cpp:814:85: warning: ‘virtual void splash::SerialDataCollector::readAttribute(int32_t, const char*, const char*, void*, splash::Dimensions*)’ is deprecated (declared at /src/include/splash/SerialDataCollector.hpp:327) [-Wdeprecated-declarations]
             readAttribute(id, dataName, DOMCOL_ATTR_GLOBAL_OFFSET, data, mpiPosition);
Flamefire commented 8 years ago

I did not want to change to much. Especially as this functions might get deprecated themself (there was a discussion to remove the libSplash attributes as openPMD has it all)

I don't have time to change that so if required this might be something for a follow-up.

ax3l commented 8 years ago

I am not sure if we should deprecate so many interfaces.

In the context where you have control over the input data and know exactly what you get this would make one-liner reads to at least 4-liner reads.

ax3l commented 8 years ago

not time critical, don't worry for now.

Flamefire commented 8 years ago

That is the highway to hell and heartbleed. "I know what I get, so give me n Bytes" ;-)

ax3l commented 8 years ago

yes I know, but libSplash has more severe design and usage flaws than being unable to securely read data from unknown sources. actually, it was totally forgotten to read other people's data in the first place.

Flamefire commented 8 years ago

So what is wrong in replacing flawed functions by safe versions and deprecating the unsafe ones? This way libSplash gets safer one PR at a time. Your argument sounds like "libSplash is unsafe. So there is no point making it safe." But that is what bugfixes should do, shouldn't they?

ax3l commented 8 years ago

the flaw is that without providing new helpers you force users to rewrite their code by a factor of 4 for each attribute read.

provide a C++ helper or good interface that does what the PR description example can do so users can replace their read from one one-liner to an other one-liner.

Flamefire commented 7 years ago

Just seen that this is still unmerged but required for parataxis. So about your last comment: There are 2 ways reading data: 1st the safe "4-liner" and second the old interface which is still there. The only way to still use a 1-liner would be using the old interface. However ATM the new interface is only a 3-liner:

    AttributeInfo* info = dc.readAttributeInfo(42, NULL, "fooAttr");

    // Deprecated functionality for fast transition (potentially unsafe)
    info->read(&attribute, sizeof(attribute));
    // OR:
    info->read(GetCollectionType<T>::type(), &attribute);

    delete info;

The only way to make the new interface a 1-liner would be something like: dc.readAttributeInfo(42, NULL, "fooAttr").read(&attribute, sizeof(attribute)). But that requires either smart pointers to be returned or C++11 moveable-only types due to the contained references in the AttributeInfo class. However using a boost/C++11 smart pointer in client code reduces it to 2 lines already.

So in the end my suggestion would be: Remove the deprecation and merge this so you can choose to user either version --> No changes to existing code required but new code may use the safer variants.

ax3l commented 7 years ago

However using a boost/C++11 smart pointer in client code reduces it to 2 lines already.

cool, can you post that example 2-liner?

So in the end my suggestion would be: Remove the deprecation and merge this so you can choose to user either version --> No changes to existing code required but new code may use the safer variants.

yes, let's do that!

Flamefire commented 7 years ago

Sure:

    std::unique_ptr<AttributeInfo> info = dc.readAttributeInfo(42, NULL, "fooAttr");

    // Deprecated functionality for fast transition (potentially unsafe)
    info->read(&attribute, sizeof(attribute));
    // OR:
    info->read(GetCollectionType<T>::type(), &attribute);

However I do like the approach with returning an AttributeInfo instead of a pointer which makes the 1-liner possible. This would need some work in the class (copy-methods, ref-ctr, ...) but is easier to use. The amount of work required is small compared to the gain.

Flamefire commented 7 years ago

Ok I got bored and tried to implement the one-liner and copyable type. However travis fails in a configuration error(?)

One-liner would now be:

dc.readAttributeInfo(42, NULL, "fooAttr").read(&attribute, sizeof(attribute))

And the safer version:

    AttributeInfo info = dc.readAttributeInfo(42, NULL, "fooAttr");
    // Check dimensionality
    assert(info.IsScalar());

    // Cast data to given type. Note: This is the type of each data point. Attribute could still contain
    // an nD array of data. Each of those points may be converted and will be stored in attribute
    // so check dataspace first
    info.read(SomeCollectionType, &attribute);

Note: untested. Just coded this in text editor

ax3l commented 7 years ago

This diff should fix travis for you, but I can't push to this PR (you can allow me by adding this on the right side of the PR's control in the web).

diff --git a/.travis.yml b/.travis.yml
index 4d40a75..2e1889a 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -48,6 +48,7 @@ before_script:
   - sudo apt-get remove -qq libopenmpi* openmpi-bin || { echo "OpenMPI not installed"; }
   - echo "yes" | sudo add-apt-repository ppa:james-page/openmpi
   - echo "yes" | sudo add-apt-repository ppa:axel-huebl/libsplash
+  - sudo rm -rf /var/lib/apt/lists/*
   - sudo apt-get update -qq
   - sudo apt-cache policy
   - sudo apt-cache policy libhdf5-serial-dev
ax3l commented 7 years ago

@Flamefire rebase against dev should fix the travis issue

ax3l commented 7 years ago

When you are finished, can you pls update the PR description with the a deprecated & recommended usage API example? I just copy pasted from below.

Flamefire commented 7 years ago

Will do. There seems to be a Heisenbug in there because before I added the close function the parallel attribute test failed with a "could not close file handle" exception from the HandleMgr which is actually expected as the Attribute is still open which means the file is delay closed. But for parallel operations all processes have to do it at the same time (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Close). This is why I added the function. But now it seems it is not really needed anymore. I'll still keep it though. The failed "Fix tests" is a strange Travis bug so nothing real there too.

I'll update the description once I'm done.

Edit: Looks like its done now :)

ax3l commented 7 years ago

However HDF5 already provides a C++ Interface. Not sure if libSplash is still of use...

yes, but only for the serial API... :'(