BUNPC / pysnirf2

Python package for reading, writing and validating Shared Near Infrared Spectroscopy Format (SNIRF) files
GNU General Public License v3.0
15 stars 11 forks source link

API to traverse/list content tree programatically? #44

Closed zEdS15B3GCwq closed 4 months ago

zEdS15B3GCwq commented 7 months ago

Hi,

What's the intended way to list/traverse the contents of an SNIRF file in a script? I mean, if I pick a node, how to enumerate its children, how to decide if the children are attributes or indexed groups that may have children themselves? I want the implementation to be future proof (and simple), by not depending on current snirf specifications (which provides a list of acceptable elements). Traversing the data like a tree seems to be the most straightforward way to do it.

Fields like _snirf_names, _unspecified_names and _indexed_groups seem to contain the information I need, but it's unpythonic to mess with private fields and I'm not sure if I can count on them in the long-term. There also doesn't seem to be a clear way to identify whether an item in _snirf_names is an indexed group or just a property/attribute. E.g. after sn=Snirf(...), sn.nirs is an indexed group, and it appear in sn._snirf_names as well as its object ID is present in sn._indexed_groups, but is there an easy way to know if sn.nirs is an array? It doesn't have array-like or dict-like methods, unless I'm missing something. It's possible to compare the id() of items in _indexed_groups and _snirf_names; indexed groups have methods such as appendGroup that identify their type; and it's also possible to compare sn._indexed_groups[0]._name to nirs, but all these are clumsy and perhaps not recommended.

sstucker commented 7 months ago

Hi,

I'm a little unclear on a few points of your question.

Are you asking this because you want to use pysnirf or because you want to develop it?

sstucker commented 7 months ago

In many cases, the evaluation of a node requires knowledge of the SNIRF structure itself. I know this is not the answer you want as it is not future proof, but this is sort of how SNIRF works at the ground floor as it is based on HDF5, where everything is a key-value pair.

You can iterate through indexed groups.

you can identify indexed groups next in the tree by their names (SNIRF has a relatively limited amount of indexed groups you can expect to find at each branch of the tree)

you could also check if the instances are 'IndexedGroup'?

sstucker commented 7 months ago

I think the feature you want (list all keys of all elements of the SNIRF file as a tree) doesn't really exist. I think I can see why you would want this.

Note though that the recursion depth is kind of limited. It's the same tree for each 'nirs' group.

I think using the private bookkeeping collections you noticed like '_snirfnames' to do something like this would work. Maybe we can work out a feature. What exactly are you trying to do?

zEdS15B3GCwq commented 7 months ago

Thanks for the feedback.

I'll take a step back by explaining what I'd like to do. I'm using Shimadzu LabNIRS, the output file of which isn't really well supported by most analytic tools. There are ways to import it into Homer etc. but that requires a few more steps and tools than I prefer. I've written a script to convert the data files to SNIRF, but that SNIRF data still has to be modified in some ways (e.g. adding probe 2D and/or 3D locations). I decided that having a GUI to inspect/edit file contents would be beneficial. My idea was that I'd write that GUI to be SNIRF structure-agnostic, i.e. simply display the tree of content in the file and rely on pysnirf to take care of validation. It would be a waste of time to add the file model to my code when it already exists and cared for by others.

Now comes the part where I'm usually told - hey, this tool already exists, don't waste your time! If that's so, I'll scratch my head and move on to another task happily. Perhaps I should've started with asking about this in the first place... And no, I don't want to convert it somehow & import it into a tool like Homer3 - I need this tool to make the conversion process better. And perhaps there are others who want a simple way of exploring SNIRF file contents.

Back to my original question, I think you understood what I meant perfectly. I want to walk the tree without knowledge of the SNIRF specification. It would be helpful if the nodes had a method that returned their children in an iterator or a dict. As long as there's a way to know which child is an indexed group (e.g. it's necessary to iterate it as well) and which is just an attribute (leaf), any approach is fine.

sstucker commented 7 months ago

The tool you're looking for might actually just be h5py directly, which this package uses under the hood. It doesn't assume anything SNIRF-related, it can be used to just recursively give you the list of keys in each Group. These keys can lead to either Groups or Datasets (HDF calls it this; what you call a leaf)

pysnirf2 goes to great lengths to hide this HDF5-like browsing from the user and instead wrap the SNIRF-specified Datasets in Python. This emulates the experience a lot of SNIRF users have opening up the files in MATLAB.

The feature you're looking for would be useful, I just didn't built it.

Let me know if this continues to come up and we can get serious about adding such a thing to this package.