aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 84 forks source link

Custom type label - NX_CHAR #282

Closed wlwatkins closed 6 months ago

wlwatkins commented 6 months ago

Hi, I'm trying to implement a nexusformat using your crate. everything is going quite smoothly, however, I have an isue regarding the label of hte string types (and most types for that matter). Indeed the nexusformat uses NX_CHAR, whilst save to h5 file with your crate leads to CHAR.

This is my output using punx

punx tree .\empty.nxs
C:\Users\USER\Documents\Coding\nexus-rs\assets\empty.nxs
  entry
    definition:CHAR = b'NXInstrument'
      @url = "https://manual.nexusformat.org/classes/..."
      @version = "1.0"

This is te code I use to create this dateset.

    pub fn to_dataset(&self, key: &str, g: &Group) -> Result<Dataset, Box<dyn Error>> {
        let ds = g.new_dataset::<types::VarLenAscii>();
        let value = types::VarLenAscii::from_ascii(self.to_value().as_str())?;
        let mut ds = ds.create(key)?;
        ds.write_scalar(&value)?;
        if let Some(attrs) = &self.attributes {
            for key in attrs.get_attributes_list() {
                attrs.to_attribute(key.as_str(), &mut ds)?;
            }
        }
        Ok(ds)
    }

where key = "definition", and the attributes are fields of self.

Is there a type I can use to make a custom tape based on your premitives?

The output I am looking for is

punx tree .\empty.nxs
C:\Users\USER\Documents\Coding\nexus-rs\assets\empty.nxs
  entry
    definition:NX_CHAR = b'NXInstrument'
      @url = "https://manual.nexusformat.org/classes/..."
      @version = "1.0"
mulimoen commented 6 months ago

I am not familiar with the nexusformat so I am unsure what the mapping of hdf5 primitives to the nexusformat is. What is the output of h5dump --header C:\Users\USER\Documents\Coding\nexus-rs\assets\empty.nxs?

wlwatkins commented 6 months ago

Here is the output

 h5dump --header .\empty.nxs
HDF5 ".\empty.nxs" {
GROUP "/" {
   GROUP "entry" {
      DATASET "definition" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         ATTRIBUTE "url" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
         ATTRIBUTE "version" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
      }

and here is the output of a reference file that does show NX_CHAR with punx

 h5dump --header .\ellips_nx_opt.test.nxs
HDF5 ".\ellips_nx_opt.test.nxs" {
GROUP "/" {
   GROUP "entry" {
      ATTRIBUTE "NX_class" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_UTF8;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
      }
      DATASET "definition" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_UTF8;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         ATTRIBUTE "url" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
         ATTRIBUTE "version" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
      }

I have truncated the file so that it only shows the attributes of interest, hence why some missing {. I see that I might have played between utf-8 and ascii, but the result is the same

wlwatkins commented 6 months ago

hum. I got it, the nexus format adds the attribute NX_class to define the type. i must have missed this in the docs.thanks anyway