dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

nwb - error: nwb_version not a proper semantic version #1485

Closed nrsc closed 2 months ago

nrsc commented 3 months ago

Hello,

I have a collection of nwb files that I exported to nwbv2 from pxp files with Igor/MEIS. While validating the nwb files with dandi, I get an error regarding the semantics of the nwb version. Example as follows.

[pywnb.GENERIC] /home/nrsc/source_nwb/001065/QN23.26.010.11.01.02-compressed.nwb — error: nwb_version "b'2.2.4'" is not a proper semantic version. See http://semver.org

Cheers,

Scott

kabilar commented 3 months ago

Hi @nrsc, thanks for the report. I am not familiar with MIES, but I can try to help. Which version of pynwb is installed in your development environment? You can run pip list | grep nwb in your terminal window to find the version. It looks like PyPI does not have version 2.2.4 released.

Hi @CodyCBakerPhD, have you used Igor/MIES before? Thank you.

CodyCBakerPhD commented 3 months ago

It looks like PyPI does not have version 2.2.4 released.

The 2.2.4 reference is of the NWB Schema, not the pynwb API version

2.2.4 is also pretty old by now (by more than 4 years) so I'd strongly recommend updating the files to an even newer schema version, which might just magically fix this issue

[pywnb.GENERIC] ... error: nwb_version "b'2.2.4'" is not a proper semantic version. See http://semver.org

I can't recall how the DANDI CLI maps invalidations/errors/etc from NWB Inspector to their convention, but that's not a standard check that I know of

Most likely that the call to NWBHDF5IO.read() on the file failed (that is, modern pynwb versions are unable to read the file)

kabilar commented 3 months ago

Thank you, @CodyCBakerPhD. What would be the easiest way to update the files to a newer schema version? Installing a newer version of Igor/MIES and exporting from the pxp files? (Assuming a newer version of pynwb and nwb-schema are supported by a later version of Igor/MIES.)

CodyCBakerPhD commented 3 months ago

I'd check with @rly (I think we saw a similar issue recently with someone trying to write a file using custom R code) but I think all string datasets just need to be written using encoding like utf-8

rly commented 3 months ago

This might be a bug in the MIES converter to NWB where the "nwb_version" attribute is written as ASCII instead of UTF-8, and the dandi semantic version validator expects UTF-8 (which is what the NWB schema says it should be).

@nrsc Can you share an example file to confirm? Or share a screenshot of HDFView when selecting the root of the file and selecting the "nwb_version" attribute on the root?

nrsc commented 3 months ago

Will follow up on this tomorrow. Thanks for the insight. Have a good nigh everyone.

yarikoptic commented 3 months ago

ASCII instead of UTF-8

or just a repr of it really after encoding.

❯ python -c 'print(repr("2.2.4".encode()))'
b'2.2.4'

but overall indeed smells with software bug of a layer creating .nwb here.

nrsc commented 3 months ago

Hey all.

@kabilar Thanks for checking in on this. As @CodyCBakerPhD mentioned 2.2.4 is the nwb version

@rly here is the version from HDFView

nwb_version

I'm not sure how much control I have over the version @CodyCBakerPhD, since I've been using MEIS to convert these pxp files to nwb. The functions are built into the software. That being said I would be interested in finding a way to either update the current files I have, or, if possible, mass convert the pxp files using another scripting option.

Cheers

rly commented 3 months ago

It looks like nwb_version is a fixed-length string with UTF-8 encoding. All strings in NWB should be variable-length. That's a bug in the MIES conversion. Can you raise an issue on https://github.com/AllenInstitute/MIES? You can fix this for your files by reading that attribute and rewriting that attribute as a variable length string across all your files. It is possible that other strings will have the same issue in your file. (namespace, neurodata_type, and object_id should probably also be variable-length strings but those are not critical, and that spec is not written anywhere.) Does the session_description dataset that is in the root of the file have a fixed-length or variable-length string dtype?

nrsc commented 3 months ago

@rly I can look to fix the issue when writing the files. That seems like a good option for now.

As for the session_description, looks like it is a variable length string, though I haven't written anything into it yet. sesssion_description

I will follow up with the MEIS devs and let them know of this issue, and see whether I can get some more info on updating the nwb version export.

t-b commented 2 months ago

I'm one of the developers of MIES. We check all our data produced in CI against pynwb.validate. So if fixed length strings are not accepted anymore, the pynwb validator should complain. Same for the encoding: ASCII vs. UTF8

A similiar issue was fixed in pynwb last year, https://github.com/NeurodataWithoutBorders/pynwb/issues/1668, so I'm suprised to see this bug.

Edit: We always use the latest released pynwb version for the validation step.

t-b commented 2 months ago

The bug in MIES was fixed. We now also use dandi validate, with some ignore pattern, to catch similiar issues.

IMHO this issue can be closed.