aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
308 stars 82 forks source link

Question: Howto read a non-scalar string attribute using hdf5-rust #274

Closed gpcureton closed 6 months ago

gpcureton commented 6 months ago

I've got a HDF5 file with the following structure (viewed with h5dump):

❯ h5dump -n GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5
HDF5 "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5" {
FILE_CONTENTS {
 group      /
 group      /All_Data
 group      /All_Data/VIIRS-MOD-GEO-TC_All
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Height
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Latitude
 dataset    /All_Data/VIIRS-MOD-GEO-TC_All/Longitude
 ...
 group      /Data_Products
 group      /Data_Products/VIIRS-MOD-GEO-TC
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Aggr
 dataset    /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0
 }
}

I am interested in using the hdf5-rust crate to read string attributes of both the root group /, and of the dataset /Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0. The signature of the dataset attribute is

ATTRIBUTE "N_Granule_ID" {
   DATATYPE  H5T_STRING {
      STRSIZE 16;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_ASCII;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 1, 1 ) / ( 1, 1 ) }
   DATA {
   (0,0): "NPP002194429582"
   }
}

I tried the following...

use anyhow::{Ok, Result};
use hdf5::File;
use ndarray::{Array, Array2};
use hdf5::types::VarLenUnicode;

fn main() -> Result<()> {

    filename = "GMTCO_npp_d20181005_t2022358_e2024003_b35959_c20181008035331888329_cspp_dev.h5".to_string();
    let file = File::open(filename)?;

    let dataset = file.dataset("Data_Products/VIIRS-MOD-GEO-TC/VIIRS-MOD-GEO-TC_Gran_0")?;
    let attribute = dataset.attr("N_Granule_ID")?;
    let datatype = attribute.dtype()?;
    let dims = attribute.ndim();

    let v_reader = attribute.as_reader();
    let v = v_reader.read::<VarLenUnicode, ndarray::Dim<[usize; 2]>>()?;

    Ok(())
}

at which the .read() method returns Error: no conversion paths found. I get the same error if I use

let v = attribute.read_2d::<VarLenUnicode>()?

or

let v = attribute.read_2d::<FixedUnicode<16_usize>>()?;

In each of these cases the variable v has the type ArrayBase<OwnedRepr<VarLenUnicode>, Dim<[usize; 2]>>.

Looking through the hdf5-rust examples and tests, I haven't been able to find any examples of reading a non-scalar string attribute with anything like a hl interface, and I suspect the stumbling block is that the attribute DATASPACE is for something like an array rather than a scalar.

mulimoen commented 6 months ago

Could you try reading as FixedAscii<16> instead of VarLenUnicode?

gpcureton commented 6 months ago

Thanks for your reply @mulimoen. For the root group attribute

let root_attr = file.attr("Mission_Name")?;

I tried

let v_reader = root_attr.as_reader();
let v = v_reader.read::<FixedAscii<4>, ndarray::Dim<[usize; 2]>>()?;
println!("\tv = {:?}", v);

and

let v = root_attr.read_2d::<FixedAscii<4>>()?;
println!("\tv = {:?}", v);

and they both gave the result

v = [["NPP"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), const ndim=2

and I got to the attribute payload with

if let Some(x) = v.first() {
    print!("\tx = {:?}", x.to_string());
}

which is what I was after. Luckily the attributes I am interested in have fixed sizes which I know ahead of time. I'm going to check a string attribute which is a "vector" of strings, and then close this issue.

gpcureton commented 6 months ago

I was also able to read in a "vector" string attribute (something like a list of filenames). The filenames are of differing sizes, but as long as the argument to FixedAscii<> is equal or greater than the longest filename, it works...

println!("\n\nReading dataset (15, 1) attribute...\n");

let dset_attr = dataset.attr("N_Anc_Filename")?;

let v = dset_attr.read_2d::<FixedAscii<104>>()?;

println!("\tv = {:?}", v);
println!("\tv.shape() = {:?}", v.shape());
println!("\tv.strides() = {:?}", v.strides());
println!("\tv.ndim() = {:?}", v.ndim());

let arr = v.iter().collect::<Vec<_>>();

let _ = arr
    .iter()
    .enumerate()
    .map(|(idx, val)| {
        println!("\tarr[{:?}] = {:?}", idx, val);
    })
    .collect::<Vec<_>>();

for (idx, val) in arr.iter().enumerate() {
    println!("\tarr[{:?}] = {:?} ({:?})", idx, val.to_string(), val.len());
}

This basically covers the most complicated use case for the files I am reading, so I'm closing this issue. Thanks again for your tip, @mulimoen !