georust / netcdf

High-level netCDF bindings for Rust
Apache License 2.0
81 stars 28 forks source link

Path subgroup access #113

Closed AntoineRenaud91 closed 10 months ago

AntoineRenaud91 commented 10 months ago

Summary

This pull request introduces new methods for the File, MutableFile, Group, and MutableGroup structs. These methods enable users to fetch a specific subgroup, variable, or attribute within a netCDF structure using a relative path formatted like a Linux file system ("group1/group2/.../var").

Key Features

Example Unit Test

Here's a unit test to demonstrate how the new feature works (can be found in netcdf/tests/file.rs):

#[test]
fn fetch_from_path() {
    // Create a temporary directory and netCDF file
    let d = tempfile::tempdir().unwrap();
    let path = d.path().join("cdf5.nc");
    {
        let mut file = netcdf::create(path.clone()).unwrap();
        let mut group = file.add_group("grp").unwrap();
        let mut subgroup = group.add_group("subgrp").unwrap();
        subgroup.add_dimension("dim", 1).unwrap();
        subgroup.add_variable::<f64>("var", &["dim"]).unwrap();
        subgroup.add_attribute("attr", "test").unwrap();
    }

    // Open the netCDF file and test the new methods
    let file = netcdf::open(path).unwrap();
    assert_eq!(
        file.group_from_path("grp/subgrp")
            .unwrap()
            .unwrap()
            .variable("var")
            .unwrap()
            .name(),
        file.variable_from_path("grp/subgrp/var").unwrap().name(),
    );

    match file
        .attribute_from_path("grp/subgrp/attr")
        .unwrap()
        .value()
        .unwrap()
    {
        netcdf::AttrValue::Str(string) => assert_eq!(string, "test"),
        _ => panic!(),
    }
}

This feature enhances user experience by providing a more intuitive and efficient way to navigate complex netCDF files.

AntoineRenaud91 commented 10 months ago

I am not sure how to solve the last failed test. It seems that the test is not finding the libnetcdf library, which I don't believe is likely due to my changes.

magnusuMET commented 10 months ago

This feature looks great! In fact, why don't we make this the default so we could do

let var = file.variable("my/group/here/variable")?;

This works like the official python library and makes groups much more pleasant compared to the lifetime struggle with the old approach.

You can ignore the failing test and I will try to figure out what is going wrong there. It is not related to your changes

AntoineRenaud91 commented 10 months ago

I'm glad you like this feature. Should I make a commit to set it as the default behavior and subsequently remove all the '_from_path' methods? It seems very unlikely that existing projects using this crate would have group or variable names with slashes.

magnusuMET commented 10 months ago

Would be great if you made that commit. Slashes are not allowed in object names in hdf5, so I would think they need to be disallowed in netCDF too.

AntoineRenaud91 commented 10 months ago

Done ! :)

AntoineRenaud91 commented 10 months ago

Just avoiding an unnecessary allocation.

I know the scope has increased a bit from the original idea, but this is a great ergonomic boost. Some other cases I have though up:

* Dimensions should also allow a group name

* What should we do if we try to add a variable `in/a/group`? Automatically create the group?

* What should we return if we request a variable in a group which does not exist? Return `None` or an error?

I think the API would be safer if we allowed recursive group creation only through the add_groupmethod. I would expect all other "add" methods to return an error if the subgroup structure doesn't exist.

As for requesting, I think it should return an error if the parent groups do not exist.

Maybe we should add new error variant ?

magnusuMET commented 10 months ago

We should probably add a new error variant, but that would be a breaking change which we could avoid by returning an Error::Str (and a TODO on Error for when we have to break the API)

AntoineRenaud91 commented 10 months ago

I just commited these changes. I created a function called 'crate' to transform an 'ncid' and a 'path' string into the 'ncid' of the deepest subgroup and the stem name. It returns an error if any of the subgroups do not exist. This way, I didn't have to make significant modifications to all the 'add' methods. I simply used this function to update the 'ncid' and the stem name.

magnusuMET commented 10 months ago

Thanks a lot for your work on this @AntoineRenaud91, this is an awesome feature!

AntoineRenaud91 commented 10 months ago

Happy to provide a useful contribution :)