areaDetector / ADCore

The home of the core components of the EPICS areaDetector software. It includes base classes for drivers and code for all of the standard plugins.
https://areadetector.github.io/areaDetector
Other
20 stars 69 forks source link

HDF5 plugin to support defining hard links in XML layout #31

Closed ulrikpedersen closed 10 years ago

ulrikpedersen commented 10 years ago

In order for the HDF5 plugin to be able to write truly NeXus compatible files, we have identified the need for an additional feature which allow defining HDF5 hard links to specific datasets in the XML layout definition.

Reference comments in #28 and #29

Pull request #30 has added a comment where a 'default' hard-link to the main dataset need to go.

prjemian commented 10 years ago

Here is an example from an area detector 1.x template file that I have used in the past to indicate to my post-processing step how to construct the link:

    <link_rules type="UserGroup">
        <link>
            <Attr name="source" outtype="NX_CHAR" type="CONST">/entry/data</Attr>
            <Attr name="target" outtype="NX_CHAR" type="CONST">/entry/instrument/detector/data</Attr>
            <!-- make data appear under detector, as well -->
       </link>
    </link_rules>

The link_rules element is specified only once in a template file and provides a list of links to be constructed after all the other HDF5 data and groups are created. Each link has two attributes as shown. To match the interface of the H5Lcreate_hard() function,

herr_t H5Lcreate_hard( hid_t obj_loc_id, const char *obj_name, hid_t link_loc_id, const char *link_name, hid_t lcpl_id, hid_t lapl_id ) 

it might be easier to rewrite:

    <link_rules type="UserGroup">
        <link>
            <Attr name="source_group" outtype="NX_CHAR" type="CONST">/entry</Attr>
            <Attr name="source_name" outtype="NX_CHAR" type="CONST">data</Attr>
            <Attr name="target_group" outtype="NX_CHAR" type="CONST">/entry/instrument/detector</Attr>
            <Attr name="target_name" outtype="NX_CHAR" type="CONST">data</Attr>
            <!-- make data appear under detector, as well -->
       </link>
    </link_rules>

see: http://www.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-CreateHard

prjemian commented 10 years ago

python code to create a specific hard link in a file named h5File

import h5py

def create_missing_link(h5File):
    h5 = h5py.File(h5File, "a")
    parent = h5['/entry/data']
    ds = parent['data']
    makeLink(parent, ds, '/entry/instrument/detector/data')
    h5.close()

def makeLink(parent, sourceObject, targetName):
    '''
    create an internal NeXus (hard) link in an HDF5 file
    '''
    if not 'target' in sourceObject.attrs:
        # NeXus link, NOT an HDF5 link!
        sourceObject.attrs["target"] = str(sourceObject.name)
    import h5py.h5g
    parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD)
MarkRivers commented 10 years ago

Folks,

I am starting to work on implementing links into the HDF5 file plugin.

I have a question about what level links should appear in the XML file. Here is what I am asking based on the hdf5_xml_layout_schema.xsd file.

I defined a hardlink element type.

/xs:complexType I can think of 2 ways to do this. A hardlink can be under a “group”. /xs:choice /xs:complexType This would be used as follows in an XML file: Or a hardlink could be a top-level element: xs:complexType /xs:choice /xs:complexType /xs:element This would be used as follows: ``` ``` I think the first syntax is easier to use, but it forces hardlinks to belong to groups. Is this OK? Mark From: Pete R Jemian [mailto:notifications@github.com] Sent: Monday, September 22, 2014 1:40 PM To: areaDetector/ADCore Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31) python code to create a specific hard link in a file named h5File import h5py def create_missing_link(h5File): ``` h5 = h5py.File(h5File, "a") parent = h5['/entry/data'] ds = parent['data'] makeLink(parent, ds, '/entry/instrument/detector/data') h5.close() ``` def makeLink(parent, sourceObject, targetName): ``` ''' create an internal NeXus (hard) link in an HDF5 file ''' if not 'target' in sourceObject.attrs: # NeXus link, NOT an HDF5 link! sourceObject.attrs["target"] = str(sourceObject.name) import h5py.h5g parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD) ``` — Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-56419326.
prjemian commented 10 years ago

I like the first syntax (links defined under the group), it will look more obvious.

On 10/6/2014 1:00 PM, Mark Rivers wrote:

Folks,

I am starting to work on implementing links into the HDF5 file plugin.

I have a question about what level links should appear in the XML file. Here is what I am asking based on the hdf5_xml_layout_schema.xsd file.

I defined a hardlink element type.

/xs:complexType I can think of 2 ways to do this. A hardlink can be under a “group”. /xs:choice /xs:complexType This would be used as follows in an XML file: Or a hardlink could be a top-level element: xs:complexType /xs:choice /xs:complexType /xs:element This would be used as follows: I think the first syntax is easier to use, but it forces hardlinks to belong to groups. Is this OK? Mark From: Pete R Jemian [mailto:notifications@github.com] Sent: Monday, September 22, 2014 1:40 PM To: areaDetector/ADCore Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31) python code to create a specific hard link in a file named h5File import h5py def create_missing_link(h5File): h5 = h5py.File(h5File, "a") parent = h5['/entry/data'] ds = parent['data'] makeLink(parent, ds, '/entry/instrument/detector/data') h5.close() def makeLink(parent, sourceObject, targetName): ''' create an internal NeXus (hard) link in an HDF5 file ''' if not 'target' in sourceObject.attrs: # NeXus link, NOT an HDF5 link! sourceObject.attrs["target"] = str(sourceObject.name) import h5py.h5g parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD) — Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-56419326. — Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58060316.
ulrikpedersen commented 10 years ago

Hi Mark, Thanks for looking into this (sorry I just noticed your comment on this entry after I sent my email...)

OK, so I agree with Pete: the first syntax is best. In fact I think this is the only correct way. All HDF elements (groups, datasets, hardlinks) must belong to groups - even if the group is the root ('/') of the tree. With your first option, you are enforcing this - whereas the second option leaves some ambiguity.

So just to confirm; the hardlinkType will look like this:

<!-- The hardlink element -->
<xs:complexType name="hardlinkType">
    <xs:attribute name="name" type="xs:string" use="required" />
    <xs:attribute name="source" type="xs:string" use="required" />
</xs:complexType>

And it will be added as an optional entry to the groupType, at the same level as datasets, attributes and other groups:

<!-- The group element can contain other elements: datasets, attributes, hardlinks, and other groups -->
<xs:complexType name="groupType">
    <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="dataset" type="datasetType" />
        <xs:element name="attribute" type="attributeType" />
        <xs:element name="hardlink" type="hardlinkType" />
        <xs:element name="group" type="groupType" />
    </xs:choice>
    <xs:attribute name="name" type="xs:string" use="required" />
    <xs:attribute name="ndattr_default" type="xs:boolean" use="optional" default="false" />
</xs:complexType> 

Finally, it can be used like this example (where a dataset has been defined in the /entry/instrument/detector/data):

<group name="data">
    <attribute name="NX_class" source="constant" value="NXdata" type="string"></attribute>
    <!-- needs a hard link from /entry/instrument/detector/data to /entry/data/data -->
    <hardlink name="data" source="/entry/instrument/detector/data"></hardlink>
</group> <!-- end group data -->

Cheers, Ulrik

MarkRivers commented 10 years ago

Folks,

I now have this working. I've changed the default layout to include the hardlink.

This is an h5dump of a resulting file:

corvette:~/scratch>h5dump --header hdf5_no_xml_file_012.h5 HDF5 "hdf5_no_xml_file_012.h5" { GROUP "/" { GROUP "entry" { ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 7; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } GROUP "data" { ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 6; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } DATASET "data" { DATATYPE H5T_STD_U8LE DATASPACE SIMPLE { ( 1024, 1024 ) / ( 1024, 1024 ) } ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 3; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } ATTRIBUTE "signal" { DATATYPE H5T_STD_I32LE DATASPACE SCALAR } } } GROUP "instrument" { ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 12; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } GROUP "NDAttributes" { ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 12; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } ATTRIBUTE "hostname" { DATATYPE H5T_STRING { STRSIZE 25; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } } GROUP "detector" { ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 10; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } GROUP "NDAttributes" { ATTRIBUTE "NX_class" { DATATYPE H5T_STRING { STRSIZE 12; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } DATASET "ColorMode" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 10 ) / ( 10 ) } ATTRIBUTE "description" { DATATYPE H5T_STRING { STRSIZE 10; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } ATTRIBUTE "source" { DATATYPE H5T_STRING { STRSIZE 6; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR } } } DATASET "data" { HARDLINK "/entry/data/data" } } GROUP "performance" { DATASET "timestamp" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 1, 5 ) / ( 1, 5 ) } } } } } } }

Note that it appears the the data is in /entry/data/data and that /entry/instrument/detector/data is the hardlink. This is OK, both of those paths are really hardlinks to the location in the file, and it needs to decide which one to display as a link and which to display as the "data".

I've committed a new branch "rivers_hdf5" on github. Please try to test it and let me know of any problems.

Mark


From: Ulrik Kofoed Pedersen [notifications@github.com] Sent: Tuesday, October 07, 2014 4:05 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

Hi Mark, Thanks for looking into this (sorry I just noticed your comment on this entry after I sent my email...)

OK, so I agree with Pete: the first syntax is best. In fact I think this is the only correct way. All HDF elements (groups, datasets, hardlinks) must belong to groups - even if the group is the root ('/') of the tree. With your first option, you are enforcing this - whereas the second option leaves some ambiguity.

So just to confirm; the hardlinkType will look like this:

/xs:complexType And it will be added as an optional entry to the groupType, at the same level as datasets, attributes and other groups: /xs:choice /xs:complexType Finally, it can be used like this example (where a dataset has been defined in the /entry/instrument/detector/data): Cheers, Ulrik — Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58156271.
ulrikpedersen commented 10 years ago

Thanks Mark,

I just ran a basic test of our use-case:

The result is a hardlinked dataset with all the attributes of the main dataset. I am a little surprised to see that the hardlink appears as the main dataset - but I guess that is just down to it being an actual hard link; i.e. indistinguishable from the real data to a client tool. I might ask around about that behavior.

Here is a h5ls (notice /entry/data/data and /entry/detector/data1 - the latter is the main dataset):

[up45@pc0009 testHdfXml]$ h5ls -r out.h5 
/                        Group
/entry                   Group
/entry/attributes        Group
/entry/data              Group
/entry/data/data         Dataset {10/Inf, 1024, 1024}
/entry/detector          Group
/entry/detector/data1    Dataset, same as /entry/data/data
/entry/detector/data2    Dataset {1/Inf, 1024, 1024}
/entry/detector/data3    Dataset {1/Inf, 1024, 1024}
/entry/instruments       Group
/entry/instruments/ColorMode Dataset {10}
/entry/instruments/timestamp Dataset {10, 5}

Cheers, Ulrik

ulrikpedersen commented 10 years ago

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

MarkRivers commented 10 years ago

I noticed this same behavior yesterday, and asked Pete Jemian about it. He basically said the same thing as Tobias.

This is his reply (and my message):


I confirm your experience with the difference between

plotting /entry/data/data

and getting an error trying to plot /entry/instrument/detector/data

The nature of hard links in HDF5 is similar to hard links in a file

system. Any entry in the HDF5 directory (such as

/entry/instrument/detector/data) is a hard link to the data object

stored in the HDF5 file. When you call H5Lcreate(), you are making an

additional link from another place in the directory to the same data

object. For convenience, the H5Lcreate() allows you to point at an

existing directory item rather than the actual data reference.

If you examine the same file with HdfView, both directory items

/entry/data/data and /entry/instrument/detector/data have the same

(5732,2) data reference. Neither one of these items is the "original"

data, from the point of view of HDF5 storage. They are both references

to the same data object. If you delete one of them, the data object

remains. Once you delete all items that reference a data object, then

HDF5 can delete the data object from the file.

h5dump cannot tell which directory item was the one created when the

data object was created. It picks one of them.

NeXus adds an additional attribute, "target", to the first directory

item to indicate which can be considered the "original". Aside from

guiding us readers of these files, this can be used to avoid data

duplication when copying the entire file contents piecemeal, such as

when translating from one format to another. This was introduced to

convert between HDF4, HDF5, and XML. In the last couple years, NeXus

has agreed to only support HDF5 going forward.

I'll see if I can get to the bottom of the NeXpy problem you pointed out.

Pete

On 10/6/2014 6:48 PM, Mark Rivers wrote:

Hi Pete,

I've attached the file. I have now managed to get the hard link code running on the HDF5 plugin, so the data appears in 2 places:

/entry/data/data This plots OK

/entry/instrument/detector/data This gives the error I sent previously

You said previously:

One such program is "NeXpy". It is a Python-based GUI. A version can

run from /APSshare:

/APSshare/epd/rh6-x86_64/bin/nexpy

If that class path does not exist, nexpy won't plot the data.

Using the model I saw in the new HDF5 plugin, by adding the hard link,

nexpy should be able to plot the data.

I interpreted this to mean that it would plot that dataset if I simply did "Data/Plot Data" without selecting any data source, i.e. it would use /entry/data/data as the default. That does not work, I get a Python error.

I also have a question about the output of h5dump:

This is the API for H5Lcreate_hard from the HDF documentation:


Name: H5Lcreate_hard Signature: herr_t H5Lcreate_hard( hid_t obj_loc_id, const char obj_name, hid_t link_loc_id, const char link_name, hid_t lcpl_id, hid_t lapl_id )

Purpose: Creates a hard link to an object.

Description: H5Lcreate_hard creates a new hard link to a pre-existing object in an HDF5 file. The new link may be one of many that point to that object.

The target object must already exist in the file.

obj_loc_id and obj_name specify the location and name, respectively, of the target object, i.e., the object that the new hard link points to.

link_loc_id and link_name specify the location and name, respectively, of the new hard link.


So the first 2 arguments refer to the link target (i.e. the real object already in the file), and the second 2 are the new link.

This is my code that creates the hard link:

   herr_t err = H5Lcreate_hard(this->file, targetName.c_str(), this->file, linkName.c_str(), 0, 0);

printf("%s::%s called H5Lcreate_hard, target=%s, link=%s\n",

driverName, functionName, targetName.c_str(), linkName.c_str());

This is the output when I run the program.

NDFileHDF5::createHardLinks called H5Lcreate_hard, target=/entry/instrument/detector/data, link=/entry/data/data

So my real object is /entry/instrument/detector/data and my link is /entry/data/data

However, when I run h5dump on the file it looks like the opposite:

corvette:~/scratch>h5dump --header hdf5_link_xml_file_010.h5

HDF5 "hdf5_link_xml_file_010.h5" {

GROUP "/" {

GROUP "entry" {

   ATTRIBUTE "NX_class" {

      DATATYPE  H5T_STRING {

            STRSIZE 7;

            STRPAD H5T_STR_NULLTERM;

            CSET H5T_CSET_ASCII;

            CTYPE H5T_C_S1;

         }

      DATASPACE  SCALAR

   }

   GROUP "data" {

      ATTRIBUTE "NX_class" {

         DATATYPE  H5T_STRING {

               STRSIZE 6;

               STRPAD H5T_STR_NULLTERM;

               CSET H5T_CSET_ASCII;

               CTYPE H5T_C_S1;

            }

         DATASPACE  SCALAR

      }

      DATASET "data" {

         DATATYPE  H5T_STD_U8LE

         DATASPACE  SIMPLE { ( 1024, 1024 ) / ( 1024, 1024 ) }

         ATTRIBUTE "NX_class" {

            DATATYPE  H5T_STRING {

                  STRSIZE 3;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "signal" {

            DATATYPE  H5T_STD_I32LE

            DATASPACE  SCALAR

         }

      }

   }

   GROUP "instrument" {

      ATTRIBUTE "NX_class" {

         DATATYPE  H5T_STRING {

               STRSIZE 12;

               STRPAD H5T_STR_NULLTERM;

               CSET H5T_CSET_ASCII;

               CTYPE H5T_C_S1;

            }

         DATASPACE  SCALAR

      }

      GROUP "NDAttributes" {

         ATTRIBUTE "CameraManufacturer" {

            DATATYPE  H5T_STRING {

                  STRSIZE 18;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "CameraModel" {

            DATATYPE  H5T_STRING {

                  STRSIZE 15;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "Gettysburg" {

            DATATYPE  H5T_STRING {

                  STRSIZE 42;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "ID_Energy_EGU" {

            DATATYPE  H5T_STRING {

                  STRSIZE 3;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "NX_class" {

            DATATYPE  H5T_STRING {

                  STRSIZE 12;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "RingCurrent_EGU" {

            DATATYPE  H5T_STRING {

                  STRSIZE 2;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         ATTRIBUTE "hostname" {

            DATATYPE  H5T_STRING {

                  STRSIZE 25;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         DATASET "AcquireTime" {

            DATATYPE  H5T_IEEE_F64LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 19;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 23;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "E" {

            DATATYPE  H5T_IEEE_F64LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 17;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 12;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "ID_Energy" {

            DATATYPE  H5T_IEEE_F64LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 16;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 11;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "ImageCounter" {

            DATATYPE  H5T_STD_I32LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 13;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 13;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "MaxSizeX" {

            DATATYPE  H5T_STD_I32LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 15;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 10;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "MaxSizeY" {

            DATATYPE  H5T_STD_I32LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 15;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 10;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "Pi" {

            DATATYPE  H5T_IEEE_F64LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 11;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 12;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "RingCurrent" {

            DATATYPE  H5T_IEEE_F64LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 20;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 13;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "Ten" {

            DATATYPE  H5T_STD_I32LE

            DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

            ATTRIBUTE "description" {

               DATATYPE  H5T_STRING {

                     STRSIZE 8;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            ATTRIBUTE "source" {

               DATATYPE  H5T_STRING {

                     STRSIZE 12;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

         }

         DATASET "timestamp" {

            DATATYPE  H5T_IEEE_F64LE

            DATASPACE  SIMPLE { ( 1, 5 ) / ( 1, 5 ) }

         }

      }

      GROUP "detector" {

         ATTRIBUTE "NX_class" {

            DATATYPE  H5T_STRING {

                  STRSIZE 10;

                  STRPAD H5T_STR_NULLTERM;

                  CSET H5T_CSET_ASCII;

                  CTYPE H5T_C_S1;

               }

            DATASPACE  SCALAR

         }

         GROUP "NDAttributes" {

            ATTRIBUTE "NX_class" {

               DATATYPE  H5T_STRING {

                     STRSIZE 12;

                     STRPAD H5T_STR_NULLTERM;

                     CSET H5T_CSET_ASCII;

                     CTYPE H5T_C_S1;

                  }

               DATASPACE  SCALAR

            }

            DATASET "ColorMode" {

               DATATYPE  H5T_STD_I32LE

               DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }

               ATTRIBUTE "description" {

                  DATATYPE  H5T_STRING {

                        STRSIZE 10;

                        STRPAD H5T_STR_NULLTERM;

                        CSET H5T_CSET_ASCII;

                        CTYPE H5T_C_S1;

                     }

                  DATASPACE  SCALAR

               }

               ATTRIBUTE "source" {

                  DATATYPE  H5T_STRING {

                        STRSIZE 6;

                        STRPAD H5T_STR_NULLTERM;

                        CSET H5T_CSET_ASCII;

                        CTYPE H5T_C_S1;

                     }

                  DATASPACE  SCALAR

               }

            }

         }

         DATASET "data" {

            HARDLINK "/entry/data/data"

         }

      }

      GROUP "performance" {

      }

   }

}

}

}

Note that /entry/data/data looks like the real array, and /entry/instrument/detector/data looks like it is just a HARDLINK?

What will happen if I put a second hardlink to the same data? Will it show up looking like data or like a link?

Thanks,

Mark

-----Original Message-----

From: Pete Jemian [mailto:jemian@anl.gov]

Sent: Monday, October 06, 2014 5:26 PM

To: Mark Rivers

Cc: Ray Osborn

Subject: Fwd: RE: nxvalidate?

Mark:

From your error report below, NeXpy has tried to examine your

NDattributes folder for the "shape" of the data to be plotted. Your

data is in:

/entry/instrument/detector/data

Your NDattributes folder is a sibling of the "data" field.

This error is produced in the NeXpy file

/src/nexpy/api/nexus/tree.py in the NXfield class, **getattr**() method, where it is trying to find numpy's ndarray attributes (note the similarity in names here). "shape" is one attribute of a numpy ndarray. NeXpy has a problem to be resolved in either the tree API or the caller of that API when plotting this data (by double-clicking the /entry/instrument/detector/data field). Can you provide a test data file? Pete -------- Original Message -------- Subject: RE: nxvalidate? Date: Mon, 6 Oct 2014 22:01:11 +0000 From: Mark Rivers > To: 'Pete Jemian' > That worked. However, when I try to view images created with areaDetector I get the attached errors for the attached file format. Hovering over /entry/instrument/detector/data shows that it is uint8(1024x1024), but when I double click I get the error: "shape not in NXcollection:NDAttributes" What's wrong? Mark -----Original Message----- From: Pete Jemian [mailto:jemian@anl.gov] Sent: Monday, October 06, 2014 3:23 PM To: Mark Rivers Subject: Re: nxvalidate? PySide has its own Qt installation. Merging these is beyond my experience. Instead, I have installed a separate Python and added PySide to that. Better use of your time to try an alternative way. Here's another Python to try. I just booted that one on usaxscontrol2.cars.aps.anl.gov and it starts with no troubles. /APSshare/anaconda/x86_64/bin/nexpy On 10/6/2014 3:19 PM, Mark Rivers wrote: > I want to use the Qt installation in /usr/local/Trolltech/Qt-4.8.4/, rather than my system install in /usr/lib64/, which is Qt-4.7.4. How do I do that? > > -----Original Message----- > > From: Pete Jemian [mailto:jemian@anl.gov] > > Sent: Monday, October 06, 2014 3:14 PM > > To: Mark Rivers > > Subject: Re: nxvalidate? > > Looks like the libraries from your /usr/local installation of Qt are > > being found before the libraries in /usr/lib64/ needed to install PySide. > > On 10/6/2014 12:25 PM, Mark Rivers wrote: > > > I do have Qt and qmake installed, and qmake is in my PATH: > > > > corvette:~/scratch/NeXpy-0.4.5>which qmake > > > > /usr/local/Trolltech/Qt-4.8.4/bin//qmake


Pete R. Jemian, Ph.D. jemian@anl.gov<mailto:jemian@anl.gov>

Beam line Controls and Data Acquisition, Group Leader

Advanced Photon Source, Argonne National Laboratory

Argonne, IL 60439 630 - 252 - 3189


Education is the one thing for which people

   are willing to pay yet not receive.

From: Ulrik Kofoed Pedersen [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 9:50 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58196531.

prjemian commented 10 years ago

HDF5 hard links are common. When data is first stored, a data object is created (such as a 2-D array of 16-bit integers, 1k x 1k). At the same time, a reference (hard link) to this data object is created as a directory entry (such as /entry/instrument/detector/data). When a file writer chooses to create a "hard link", they first start with a directory entry and then name the new directory entry to be created (such as /entry/data/data). This request creates a new directory reference to the same data object. To the casual browser of the HDF5 file, the directory entries look identical without regard to which one created the data object and which one was the added link.

This is easy to see using HdfView. (The file is the same as yesterday but modified by Ray Osborn to add data axes for plotting.) The data object is stored at offset (11336,2) as viewed from either /entry/instrument/detector/data or /entry/data/data.

NeXus encountered the same situation. This could lead to data duplication, such as when translating the content of a data file between HDF5, HDF4, and XML. To avoid data duplication, NeXus adds an attribute target="/HDF5/path/to/source/directory/entry" when creating the hard link for the first time. Such a tag can be used to compare a directory entry's name with this target attribute. If they do not match, do not duplicate this data.

Here is python code (using h5py) to create an HDF5 link in a NeXus data file:

def makeLink(parent, sourceObject, targetName):
     """
     create an internal NeXus (hard) link in an HDF5 file

     :param obj parent: parent group of source
     :param obj sourceObject: existing HDF5 object
     :param str targetName: HDF5 node path to be created,
                             such as ``/entry/data/data``
     """
     if not 'target' in sourceObject.attrs:
         # NeXus link, NOT an HDF5 link!
         sourceObject.attrs["target"] = str(sourceObject.name)
     import h5py.h5g
     parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD)

On 10/7/2014 9:49 AM, Ulrik Kofoed Pedersen wrote:

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58196531.

MarkRivers commented 10 years ago

My current implementation uses syntax like this

  <hardlink name="data" source="/entry/instrument/detector/data"></hardlink>

It thus uses the XML attribute “source”.

The HDF5 documentation (and the Linux man pages for the “ln” command) both call the object being pointed to the “target”, not the “source”. Thus I am proposing to change the syntax to:

  <hardlink name="data" target="/entry/instrument/detector/data"></hardlink>

Comments?

Mark

From: Pete R Jemian [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 10:10 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

HDF5 hard links are common. When data is first stored, a data object is created (such as a 2-D array of 16-bit integers, 1k x 1k). At the same time, a reference (hard link) to this data object is created as a directory entry (such as /entry/instrument/detector/data). When a file writer chooses to create a "hard link", they first start with a directory entry and then name the new directory entry to be created (such as /entry/data/data). This request creates a new directory reference to the same data object. To the casual browser of the HDF5 file, the directory entries look identical without regard to which one created the data object and which one was the added link.

This is easy to see using HdfView. (The file is the same as yesterday but modified by Ray Osborn to add data axes for plotting.) The data object is stored at offset (11336,2) as viewed from either /entry/instrument/detector/data or /entry/data/data.

NeXus encountered the same situation. This could lead to data duplication, such as when translating the content of a data file between HDF5, HDF4, and XML. To avoid data duplication, NeXus adds an attribute target="/HDF5/path/to/source/directory/entry" when creating the hard link for the first time. Such a tag can be used to compare a directory entry's name with this target attribute. If they do not match, do not duplicate this data.

Here is python code (using h5py) to create an HDF5 link in a NeXus data file:

def makeLink(parent, sourceObject, targetName):
"""
create an internal NeXus (hard) link in an HDF5 file

:param obj parent: parent group of source
:param obj sourceObject: existing HDF5 object
:param str targetName: HDF5 node path to be created,
such as ``/entry/data/data``
"""
if not 'target' in sourceObject.attrs:
# NeXus link, NOT an HDF5 link!
sourceObject.attrs["target"] = str(sourceObject.name)
import h5py.h5g
parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD)

On 10/7/2014 9:49 AM, Ulrik Kofoed Pedersen wrote:

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58196531.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58199957.

prjemian commented 10 years ago

The HDF5 interface creates confusion by using "target" in this way. Perhaps "existing" is a synonym that does not confuse.

On 10/7/2014 10:18 AM, Mark Rivers wrote:

My current implementation uses syntax like this

It thus uses the XML attribute “source”.

The HDF5 documentation (and the Linux man pages for the “ln” command) both call the object being pointed to the “target”, not the “source”. Thus I am proposing to change the syntax to:

Comments?

Mark

From: Pete R Jemian [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 10:10 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

HDF5 hard links are common. When data is first stored, a data object is created (such as a 2-D array of 16-bit integers, 1k x 1k). At the same time, a reference (hard link) to this data object is created as a directory entry (such as /entry/instrument/detector/data). When a file writer chooses to create a "hard link", they first start with a directory entry and then name the new directory entry to be created (such as /entry/data/data). This request creates a new directory reference to the same data object. To the casual browser of the HDF5 file, the directory entries look identical without regard to which one created the data object and which one was the added link.

This is easy to see using HdfView. (The file is the same as yesterday but modified by Ray Osborn to add data axes for plotting.) The data object is stored at offset (11336,2) as viewed from either /entry/instrument/detector/data or /entry/data/data.

NeXus encountered the same situation. This could lead to data duplication, such as when translating the content of a data file between HDF5, HDF4, and XML. To avoid data duplication, NeXus adds an attribute target="/HDF5/path/to/source/directory/entry" when creating the hard link for the first time. Such a tag can be used to compare a directory entry's name with this target attribute. If they do not match, do not duplicate this data.

Here is python code (using h5py) to create an HDF5 link in a NeXus data file:

def makeLink(parent, sourceObject, targetName):
"""
create an internal NeXus (hard) link in an HDF5 file

:param obj parent: parent group of source
:param obj sourceObject: existing HDF5 object
:param str targetName: HDF5 node path to be created,
such as ``/entry/data/data``
"""
if not 'target' in sourceObject.attrs:
# NeXus link, NOT an HDF5 link!
sourceObject.attrs["target"] = str(sourceObject.name)
import h5py.h5g
parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD)

On 10/7/2014 9:49 AM, Ulrik Kofoed Pedersen wrote:

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

— Reply to this email directly or view it on GitHub

https://github.com/areaDetector/ADCore/issues/31#issuecomment-58196531.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58199957.

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58201460.

MarkRivers commented 10 years ago

“target” it is the same syntax that the Linux man page for “ln” uses:

LN(1) User Commands LN(1)

NAME ln - make links between files

SYNOPSIS ln [OPTION]... [-T] TARGET LINK_NAME (1st form) ln [OPTION]... TARGET (2nd form) ln [OPTION]... TARGET... DIRECTORY (3rd form) ln [OPTION]... -t DIRECTORY TARGET... (4th form)

DESCRIPTION In the 1st form, create a link to TARGET with the name LINK_NAME. In the 2nd form, create a link to TARGET in the current directory. In the 3rd and 4th forms, create links to each TARGET in DIRECTORY. Create hard links by default, symbolic links with --symbolic. When creating hard links, each TARGET must exist. Symbolic links can hold arbi- trary text; if later resolved, a relative link is interpreted in relation to its parent directory.

From: Pete R Jemian [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 10:29 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

The HDF5 interface creates confusion by using "target" in this way. Perhaps "existing" is a synonym that does not confuse.

On 10/7/2014 10:18 AM, Mark Rivers wrote:

My current implementation uses syntax like this

It thus uses the XML attribute “source”.

The HDF5 documentation (and the Linux man pages for the “ln” command) both call the object being pointed to the “target”, not the “source”. Thus I am proposing to change the syntax to:

Comments?

Mark

From: Pete R Jemian [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 10:10 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

HDF5 hard links are common. When data is first stored, a data object is created (such as a 2-D array of 16-bit integers, 1k x 1k). At the same time, a reference (hard link) to this data object is created as a directory entry (such as /entry/instrument/detector/data). When a file writer chooses to create a "hard link", they first start with a directory entry and then name the new directory entry to be created (such as /entry/data/data). This request creates a new directory reference to the same data object. To the casual browser of the HDF5 file, the directory entries look identical without regard to which one created the data object and which one was the added link.

This is easy to see using HdfView. (The file is the same as yesterday but modified by Ray Osborn to add data axes for plotting.) The data object is stored at offset (11336,2) as viewed from either /entry/instrument/detector/data or /entry/data/data.

NeXus encountered the same situation. This could lead to data duplication, such as when translating the content of a data file between HDF5, HDF4, and XML. To avoid data duplication, NeXus adds an attribute target="/HDF5/path/to/source/directory/entry" when creating the hard link for the first time. Such a tag can be used to compare a directory entry's name with this target attribute. If they do not match, do not duplicate this data.

Here is python code (using h5py) to create an HDF5 link in a NeXus data file:

def makeLink(parent, sourceObject, targetName):
"""
create an internal NeXus (hard) link in an HDF5 file

:param obj parent: parent group of source
:param obj sourceObject: existing HDF5 object
:param str targetName: HDF5 node path to be created,
such as ``/entry/data/data``
"""
if not 'target' in sourceObject.attrs:
# NeXus link, NOT an HDF5 link!
sourceObject.attrs["target"] = str(sourceObject.name)
import h5py.h5g
parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD)

On 10/7/2014 9:49 AM, Ulrik Kofoed Pedersen wrote:

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

— Reply to this email directly or view it on GitHub

https://github.com/areaDetector/ADCore/issues/31#issuecomment-58196531.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58199957.

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58201460.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58203131.

prjemian commented 10 years ago

Agreed, it is a matter of perspective. Documentation can help set that perspective. "Creating a link from this new entry back to the existing TARGET entry"

On 10/7/2014 10:33 AM, Mark Rivers wrote:

“target” it is the same syntax that the Linux man page for “ln” uses:

LN(1) User Commands LN(1)

NAME ln - make links between files

SYNOPSIS ln [OPTION]... [-T] TARGET LINK_NAME (1st form) ln [OPTION]... TARGET (2nd form) ln [OPTION]... TARGET... DIRECTORY (3rd form) ln [OPTION]... -t DIRECTORY TARGET... (4th form)

DESCRIPTION In the 1st form, create a link to TARGET with the name LINK_NAME. In the 2nd form, create a link to TARGET in the current directory. In the 3rd and 4th forms, create links to each TARGET in DIRECTORY. Create hard links by default, symbolic links with --symbolic. When creating hard links, each TARGET must exist. Symbolic links can hold arbi- trary text; if later resolved, a relative link is interpreted in relation to its parent directory.

From: Pete R Jemian [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 10:29 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

The HDF5 interface creates confusion by using "target" in this way. Perhaps "existing" is a synonym that does not confuse.

On 10/7/2014 10:18 AM, Mark Rivers wrote:

My current implementation uses syntax like this

<hardlink name="data" source="/entry/instrument/detector/data">

It thus uses the XML attribute “source”.

The HDF5 documentation (and the Linux man pages for the “ln” command) both call the object being pointed to the “target”, not the “source”. Thus I am proposing to change the syntax to:

<hardlink name="data" target="/entry/instrument/detector/data">

Comments?

Mark

From: Pete R Jemian [mailto:notifications@github.com] Sent: Tuesday, October 07, 2014 10:10 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

HDF5 hard links are common. When data is first stored, a data object is created (such as a 2-D array of 16-bit integers, 1k x 1k). At the same time, a reference (hard link) to this data object is created as a directory entry (such as /entry/instrument/detector/data). When a file writer chooses to create a "hard link", they first start with a directory entry and then name the new directory entry to be created (such as /entry/data/data). This request creates a new directory reference to the same data object. To the casual browser of the HDF5 file, the directory entries look identical without regard to which one created the data object and which one was the added link.

This is easy to see using HdfView. (The file is the same as yesterday but modified by Ray Osborn to add data axes for plotting.) The data object is stored at offset (11336,2) as viewed from either /entry/instrument/detector/data or /entry/data/data.

NeXus encountered the same situation. This could lead to data duplication, such as when translating the content of a data file between HDF5, HDF4, and XML. To avoid data duplication, NeXus adds an attribute target="/HDF5/path/to/source/directory/entry" when creating the hard link for the first time. Such a tag can be used to compare a directory entry's name with this target attribute. If they do not match, do not duplicate this data.

Here is python code (using h5py) to create an HDF5 link in a NeXus data file:

def makeLink(parent, sourceObject, targetName):
"""
create an internal NeXus (hard) link in an HDF5 file

:param obj parent: parent group of source
:param obj sourceObject: existing HDF5 object
:param str targetName: HDF5 node path to be created,
such as ``/entry/data/data``
"""
if not 'target' in sourceObject.attrs:
# NeXus link, NOT an HDF5 link!
sourceObject.attrs["target"] = str(sourceObject.name)
import h5py.h5g
parent._id.link(sourceObject.name, targetName, h5py.h5g.LINK_HARD)

On 10/7/2014 9:49 AM, Ulrik Kofoed Pedersen wrote:

Just a quick comment to let you know that I have discussed this funny behaviour of a hardlinks with my colleague Tobias Richter who has a lot of NeXus/HDF5 experience.

What confused me was that the hardlink in my example would appear as the main dataset when viewed with h5ls or h5dump. Tobias confirmed that this is indeed a behaviour that he has noticed when working with NeXus/HDF5 files.

We think it maybe due to datasets and hardlinks sharing the same objectID (like inodes on a filesystem). The applications h5dump and h5ls will iterate through the files objectIDs and will probably select the first one it finds as the 'main' dataset - with all others of the same objectID, being labelled as references...

It really doesn't make any difference as all the datasets under the hood are just references to file system locations - apparently even if you delete the main dataset, but still have a hardlink, pointing to it, the real data is not deleted (or lost) from disk.

— Reply to this email directly or view it on GitHub

https://github.com/areaDetector/ADCore/issues/31#issuecomment-58196531.

— Reply to this email directly or view it on

GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58199957.

— Reply to this email directly or view it on GitHub

https://github.com/areaDetector/ADCore/issues/31#issuecomment-58201460.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58203131.

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58203893.

ulrikpedersen commented 10 years ago

I vote for "target" rather than "source" or "existing" or "reference" for that matter. I like the comparison with the 'ln' Linux command.

prjemian commented 10 years ago

tested your branch on linuxmint 17, wrote HDF5 file using default template read it into NeXpy so problems encountered

Suggest you implement the "target" attribute as discussed previously when creating a hard link. This is a NeXus-specific addition. Could it be an option in the XML template's link specification to create this attribute if not existing? Default behavior is not to create the target attribute. That would satisfy the non-NeXus users of this plugin.

On 10/7/2014 7:32 AM, Mark Rivers wrote:

I've committed a new branch "rivers_hdf5" on github. Please try to test it and let me know of any problems.

prjemian commented 10 years ago

no problems encountered

On 10/7/2014 11:14 AM, Pete Jemian wrote:

so problems encountered

prjemian commented 10 years ago

Add this to the hdf5_xml_layout_schema.xsd within the <xs:complexType name="hardlinkType"> element:

   <xs:attribute name="isNexus" type="xs:boolean" use="optional" 
default="false" />

A "true" value will trigger the creation of the "target" attribute as described.

On 10/7/2014 11:14 AM, Pete Jemian wrote:

Suggest you implement the "target" attribute as discussed previously when creating a hard link. This is a NeXus-specific addition. Could it be an option in the XML template's link specification to create this attribute if not existing? Default behavior is not to create the target attribute. That would satisfy the non-NeXus users of this plugin.

ulrikpedersen commented 10 years ago

Hi Pete,

Thanks for the testing!

I would prefer not to have an "isNexus" attribute as it's use would obscure what is really going on. I much prefer having to specifically instantiate the hardlink in the XML so that it is perfectly clear what the final file will look like.

The reason is that the XML should exactly map the layout of the file with as little implicit functionality as possible. It is a matter of decoupling problem domains: the file layout is now entirely deferred to the XML file - and we should try hard to avoid that layout definitions (NeXus or otherwise) creep back into the plugin code.

Secondarily I also want to keep the plugin + XML schema generic - i.e. without any NeXus specific elements (or any other format for that matter). Our main use-case may be NeXus compatibility, but that can be achieved by configuring the XML layout file properly. To help and guide the users into writing valid, working NeXus XML layout definitions, I propose that we provide a set of examples with the distribution - and maybe a NeXus-specific XSD schema. With a bit more development work we can even extend the plugin to support validation against an XSD (I think libxml2 support this functionality).

I understand it would make life a little easier to have this "isNexus" tag - but I think the gain is relatively small compared with the downsides to embedding NeXus-specifc functionality.

I hope that makes sense....

Cheers, Ulrik

MarkRivers commented 10 years ago

I don't think Pete was suggesting that the isNexus attribute would instantiate the hardlink, but rather that the isNexus would be an attribute of the hardlink element and would be used to create a "target" attribute that pointed to the original data location. But I agree with your point that everything should be explicit in the XML file, and that is how I have now implemented it.

I think we are very close to be able to release R2-1. I will ask Tim Mooney to convert the new and changed .adl files.

Mark


From: Ulrik Kofoed Pedersen [notifications@github.com] Sent: Wednesday, October 08, 2014 3:03 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

Hi Pete,

Thanks for the testing!

I would prefer not to have an "isNexus" attribute as it's use would obscure what is really going on. I much prefer having to specifically instantiate the hardlink in the XML so that it is perfectly clear what the final file will look like.

The reason is that the XML should exactly map the layout of the file with as little implicit functionality as possible. It is a matter of decoupling problem domains: the file layout is now entirely deferred to the XML file - and we should try hard to avoid that layout definitions (NeXus or otherwise) creep back into the plugin code.

Secondarily I also want to keep the plugin + XML schema generic - i.e. without any NeXus specific elements (or any other format for that matter). Our main use-case may be NeXus compatibility, but that can be achieved by configuring the XML layout file properly. To help and guide the users into writing valid, working NeXus XML layout definitions, I propose that we provide a set of examples with the distribution - and maybe a NeXus-specific XSD schema. With a bit more development work we can even extend the plugin to support validation against an XSD (I think libxml2 support this functionality).

I understand it would make life a little easier to have this "isNexus" tag - but I think the gain is relatively small compared with the downsides to embedding NeXus-specifc functionality.

I hope that makes sense....

Cheers, Ulrik

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58323458.

prjemian commented 10 years ago

Still, the file hdf5_layout_demo.xml needs some adjustment if it is to create a NeXus data file.

[1] The NXdetector group should be a child of a NXinstrument group.

NXentry
  NXinstrument
    NXdetector
  NXdata

And adjust the target attribute accordingly.

[2] The NXdata group should be a child of the NXentry group (as shown above).

ulrikpedersen commented 10 years ago

Agreed; hdf5_layout_demo.xml is just a generic demo that we happen to put some NeXus tags in.

Perhaps we should create another XML file for a fully compliant NeXus definition with a more appropriate name (hdf5_nexus_layout.xml)?

prjemian commented 10 years ago

Let's make this into 2 templates. One is HDF5 without NeXus, the other is compliant NeXus. On Oct 8, 2014 8:22 AM, "Ulrik Kofoed Pedersen" notifications@github.com wrote:

Agreed; hdf5_layout_demo.xml is just a generic demo that we happen to put some NeXus tags in.

Perhaps we should create another XML file for a fully compliant NeXus definition with a more appropriate name (hdf5_nexus_layout.xml)?

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58356006.

MarkRivers commented 10 years ago

The hard-coded default layout in NDFileHDF5LayoutXML.cpp is NeXus-compliant. I can extract that and make it the NeXus example template as well.

I can strip down the other one to take out all NeXus tags.

On that topic, when I did extract the hard-coded XML from NDFileHDF5LayoutXML.cpp into an XML file and run it through xmllint with the .xsd file it found an error:

corvette:ADCore/ADApp/pluginSrc>xmllint --noout --schema ../../iocBoot/hdf5_xml_layout_schema.xsd default_hdf5_layout.xml default_hdf5_layout.xml:22: element dataset: Schemas validity error : Element 'dataset': The attribute 'source' is required but missing. default_hdf5_layout.xml fails to validate

The problem is that the “timestamp” dataset does not contain a “source” attribute. What should that be?

  <group name=\"performance\"> \
    <dataset name=\"timestamp\"></dataset> \
  </group>            <!-- end group performance --> \

Mark

From: Pete R Jemian [mailto:notifications@github.com] Sent: Wednesday, October 08, 2014 8:52 AM To: areaDetector/ADCore Cc: Mark Rivers Subject: Re: [ADCore] HDF5 plugin to support defining hard links in XML layout (#31)

Let's make this into 2 templates. One is HDF5 without NeXus, the other is compliant NeXus. On Oct 8, 2014 8:22 AM, "Ulrik Kofoed Pedersen" notifications@github.com<mailto:notifications@github.com> wrote:

Agreed; hdf5_layout_demo.xml is just a generic demo that we happen to put some NeXus tags in.

Perhaps we should create another XML file for a fully compliant NeXus definition with a more appropriate name (hdf5_nexus_layout.xml)?

— Reply to this email directly or view it on GitHub https://github.com/areaDetector/ADCore/issues/31#issuecomment-58356006.

— Reply to this email directly or view it on GitHubhttps://github.com/areaDetector/ADCore/issues/31#issuecomment-58360134.

MarkRivers commented 10 years ago

The hdf5_xml_layout_schema.xsd file says that the choices for the dataset source (dsetSourceEnum) includes "constant", and the dataset supports the attribute "value". Thus the following is legal syntax:

But when my XML file contains that line I get the following error when I write the file:

HDF5-DIAG: Error detected in HDF5 (1.8.7) thread 0:

000: H5T.c line 2151 in H5Tset_size(): size must be positive

major: Invalid arguments to routine
minor: Bad value

Are source="constant" and the "value" attribute supported for datasets, either individually or in combination. How does it know the datatype of the dataset?

Am I doing something wrong?

Thanks, Mark

MarkRivers commented 10 years ago

Practically, the number of bytes required is not the concern, nor is the actual work of constructing such things.

Actually, constructing those in the areaDetector plugin is not trivial at all. If we want to include the dimension scale datasets in the XML file it needs to be:

So we could only generate these by hardcoding in the plugin?

Mark

-----Original Message----- From: Pete Jemian [mailto:jemian@anl.gov] Sent: Wednesday, October 08, 2014 4:02 PM To: Mark Rivers Subject: Re: [NeXus-tech] Nexus server downtime

I agree with you that is silly to the point of absurd to require such trivial index arrays just to satisfy the notion of default visualization. And have asserted that at NeXus meetings. I'll continue that. Practically, the number of bytes required is not the concern, nor is the actual work of constructing such things. The absurdity is that the default case (as seen by image data) is expected to generate the index arrays.

Look here for a discussion of how to describe the dimension scales: http://cansas-org.github.io/NXcanSAS/datarules.html#linking-multi-dimensional-data-with-axis-data

Look here for an explanation of what "axes" is: http://cansas-org.github.io/NXcanSAS/design.html#index-11

Look here for the "preferred (and recommended) method" to describe the dimension scales: http://cansas-org.github.io/NXcanSAS/classes/base_classes/NXdata.html#index-3

I cannot find any text that declares the axes attribute is required or that dimension scales must be provided. One reason for this is that the NeXus schema must accept as valid legacy data files that were created before the "axes" attribute was implemented. These legacy files use the "axis" attribute, as shown in: http://cansas-org.github.io/NXcanSAS/design.html#index-11

It should not be up to me or Ray or anyone else to decide if the file is compliant; the rule should be obvious. Reading the NXdata specification (the authoritative reference for this), there is no such requirement, despite the words "preferred (and recommended) method".

Pete

On 10/8/2014 3:28 PM, Mark Rivers wrote:

Hi Pete,

Thanks, I looked at the document.

Ray stated that the files we are producing are non-compliant because they don't contain the axes definitions. Can you point me to the place in the manual where this requirement is stated?

Does this mean that if I am doing radiography with a 2K x 2K detector I must create 2 1-D datasets, axis1 and axis2 each containing [0,1,2,3...2047] or else my NeXus file is non-compliant? This seems silly. Sometimes pictures really don't need axes to be meaningful.

Mark

-----Original Message----- From: Pete Jemian [mailto:jemian@anl.gov] Sent: Wednesday, October 08, 2014 1:57 PM To: Mark Rivers Subject: Fwd: [NeXus-tech] Nexus server downtime

Yes, NeXus web site is down If you need the manual, I have a shadow copy of it in a GitHub fork I'm workong on for canSAS. It has everything current plus a new contributed definition for canSAS.

http://cansas-org.github.io/NXcanSAS/

-------- Original Message -------- Subject: [NeXus-tech] Nexus server downtime Date: Wed, 8 Oct 2014 13:53:38 +0000 From: freddie.akeroyd@stfc.ac.uk Reply-To: NeXus Technical sub-committee list nexus-tech@nexusformat.org To: nexus-tech@nexusformat.org

There is power work again this evening in the room containing the nexus wiki /doc builder server so I'm afraid it will be down from 5pm BST until early tomorrow morning.

Regards,

Freddie


Pete R. Jemian, Ph.D. jemian@anl.gov Beam line Controls and Data Acquisition, Group Leader Advanced Photon Source, Argonne National Laboratory

Argonne, IL 60439 630 - 252 - 3189

Education is the one thing for which people
   are willing to pay yet not receive.

ulrikpedersen commented 10 years ago

I'm confused, I'm sure I commented on this last week, but now I find no reference to my comment here on email... So apologies, if you find I repeat myself:

Regarding the NeXus dataset axes: I did discuss this with Tobias Richter (and other colleagues from our Data Acquisition and Scientific Software groups) and they were all quite convinced that axes are not strictly required - but nice to have if possible.

I don't think there is a very elegant option to add support for axes in our current system. However, there are a few options for hacks/workaround that could be used to add this support. The easier one could be to add N(=3 or =9) waveforms to the plugin, where a user can load in a given axis scale definition. This leads to a number of potential problems of course as it will be the users responsibility to update the axis when changing ROI, frame size, zoom level etc etc.

How we (sometimes) solve this: on our beamlines, an external supervisory application (GDA) may add axes to the file, if any can be defined. GDA (supposedly) understands the full beamline setup, motor positions and other endstation configurations...

Regarding the constant dataset: I think this is a separate issue and have created #36 to discuss it.

prjemian commented 10 years ago

agreed

Here is an excerpt of what I wrote to Mark last week:

It should not be up any one person to decide if a file is compliant with the NeXus standard; the rule should be obvious. Reading the NXdata specification (the authoritative reference for this), there is no such requirement, despite the words "preferred (and recommended) method".

The "axes" attribute identifies the "dimension scales" used to plot the independent axes.

Discussion of how to describe the dimension scales: http://download.nexusformat.org/doc/html/datarules.html#linking-multi-dimensional-data-with-axis-data

Explanation of what "axes" is: http://download.nexusformat.org/doc/html/design.html#index-11

The "preferred (and recommended) method" to describe the dimension scales: http://download.nexusformat.org/doc/html/classes/base_classes/NXdata.html#index-3

I cannot find any text that declares the axes attribute is required or that dimension scales must be provided.

Pete

ulrikpedersen commented 10 years ago

Mark implemented the hardlink feature. Closing the issue.