NeurodataWithoutBorders / matnwb

A Matlab interface for reading and writing NWB files
BSD 2-Clause "Simplified" License
49 stars 32 forks source link

[Bug]: Softlinks does not enforce the correct type, nor is it required to add a linked type as a softlink #559

Open ehennestad opened 5 months ago

ehennestad commented 5 months ago

What happened?

This issue related to the following points:

  1. I can add a linked type directly to an NWB type/class without using the types.untyped.Softlink. Example (adding a device):
    
    % Create a device
    device = types.core.Device();
    nwb.general_devices.set('Device', device);

imaging_plane = types.core.ImagingPlane( ... 'description', 'a very interesting part of the brain', ... 'device', device, ...


2. I can add another type to the `device` property using a SoftLink class.

Example:

% Create an optical channel. optical_channel_green = types.core.OpticalChannel( ... 'description', 'green', ... 'emission_lambda', 500.);

imaging_plane = types.core.ImagingPlane( ... 'description', 'a very interesting part of the brain', ... 'device', types.untyped.SoftLink(optical_channel_green), ...


In both cases the nwb export works fine.

This issue might be related to this one:
https://github.com/NeurodataWithoutBorders/matnwb/issues/48

The concept of h5 links / softlinks might be too specific for the average user to care about so maybe it would be better to handle this internally? 

I.e in the set method for a linked type, if the value is not a `types.untyped.SoftLink` convert it to one? This would take away the burden from the user to check whether a type is linked or "embedded" (i.e an optical channel is "embedded", not linked as far as I can see). It would also ensure that it's harder to add types that are not correct, as in point 2.

Further, if the user provides an object of type  `types.untyped.SoftLink`,  the validator should check that the `target` of the `types.untyped.SoftLink` is of the correct type.

See here for three .m scripts to reproduce:
https://www.dropbox.com/scl/fo/bdvm4tgc79n2w4sn4f2ij/h?rlkey=wccijdyeaxm1b5gonzwmcu0w2&dl=0

### Steps to Reproduce

```matlab
See linked files above

Error Message

No errors

Operating System

macOS

Matlab Version

MATLAB Version: 23.2.0.2409890 (R2023b) Update 3 MATLAB License Number: 885740 Operating System: macOS Version: 13.5 Build: 22G74 Java Version: Java 1.8.0_392-b08 with Amazon.com Inc. OpenJDK 64-Bit Server VM mixed mode

Code of Conduct

ehennestad commented 5 months ago

Also, is there a reason a softlink is only being resolved when the NWB file is being exported, and not when the SoftLink is created in the first place?

lawrence-mbf commented 5 months ago

Also, is there a reason a softlink is only being resolved when the NWB file is being exported, and not when the SoftLink is created in the first place?

I actually don't know if softlink is ever technically validated. HDF5 certainly doesn't care if you write an invalid one.

Take this script for example: ```matlab fid = H5F.create('test.h5'); sl1 = types.untyped.SoftLink('/dne1'); sl1.export(fid, '/link1', {}); sl2 = types.untyped.SoftLink('/dne2'); sl2.export(fid, '/link2', {}); sl3 = types.untyped.SoftLink('/dne3'); sl3.export(fid, '/link3', {}); H5F.close(fid); ``` This creates a valid hdf5 file, you just get missed target paths in HDFView. ![Screenshot 2024-02-01 100600](https://github.com/NeurodataWithoutBorders/matnwb/assets/33462079/59ae10d6-1890-428c-8b2a-de4963648ef2)

That said, you cannot actually validate if an object is "pointable" by HDF5 path until you actually write the object to the same file as the soft link which is probably why any kind of validation is done at export level. Certainly, though, someone could add some validation on the type of object when passing softlinks in, but this is also only the case if the softlink was provided with a target object instead of a raw path (some old matnwb scripts probably still use this).

I do like the idea of auto-coercing to a SoftLink but I would argue that explicitly defining the SoftLink lets the user know that the data is actually not written to disk if wrapped in one. The user still has to embed the file somewhere or you will produce a NWB file with invalid links.

Note that I will not be doing much more work on MatNWB but I leave the repo open in case PRs come in. I would imagine that the repo itself will be archived at some point in the future.

ehennestad commented 5 months ago

That said, you cannot actually validate if an object is "pointable" by HDF5 path until you actually write the object to the same file as the soft link which is probably why any kind of validation is done at export level.

I didn't look into this in detail, but I imagine you could easily modify the NWB File object to store an internal cache or map of all the objects that have been added to it.

I.e, when I do nwb.general_devices.set('Device', device) that could be flagged somehow on the nwb file object, and when I later add a softlink, it could validate against the objects that are already cached/flagged.

I can imagine it is not foolproof, but the alternative is to allow types to be added as groups instead of links or even adding the completely wrong types, and this is a violation of the nwb schemas.

I would argue that explicitly defining the SoftLink lets the user know that the data is actually not written to disk if wrapped in one. The user still has to embed the file somewhere or you will produce a NWB file with invalid links.

I don't think anyone who is not very familiar with h5 would catch this nuance.

Note that I will not be doing much more work on MatNWB but I leave the repo open in case PRs come in. I would imagine that the repo itself will be archived at some point in the future.

Oh, that's unfortunate, I hope there will be a way to keep it alive still!

lawrence-mbf commented 5 months ago

I didn't look into this in detail, but I imagine you could easily modify the NWB File object to store an internal cache or map of all the objects that have been added to it.

With the way that MatNWB has designed assignments, keeping a map at the highest level will only work if the assignment includes the NwbFile object i.e. nwb.acquisition.set('acq', obj);. But this does not work for anything more advanced object compositions like, say Dynamic Tables. You can easily construct a DynamicTable without touching the nwb object at all (which would not be caught by subsref). I suppose you could have all classes do this and have the nwb object just merge the maps at the top when the top-level subsref receives the object. Then you could check broken soft links prior to export.

...but the alternative is to allow types to be added as groups instead of links or even adding the completely wrong types, and this is a violation of the nwb schemas.

I agree, this is an oversight in design of SoftLinks. Some low-hanging fruit for development here is to just generalize the ExternalLink validation (which ironically has better and more thorough checks) so at least the wrong type is not passed in.

ehennestad commented 4 months ago

Pt 2 from the first post is also true for types.untyped.ObjectView

Also here there is an opportunity for hiding the implementation of ObjectView as suggested for Softlinks

I.e in the set method for a linked type, if the value is not a types.untyped.SoftLink convert it to one? This would take away the burden from the user to check whether a type is linked or "embedded" (i.e an optical channel is "embedded", not linked as far as I can see). It would also ensure that it's harder to add types that are not correct, as in point 2.