dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

Test suite failure on HDMF 3.14.4 #1494

Closed CodyCBakerPhD closed 2 months ago

CodyCBakerPhD commented 2 months ago

Found this in the NWB Inspector

We use the following sequence in the actions: https://github.com/NeurodataWithoutBorders/nwbinspector/blob/a7e7172d1e1cabcba703c0c1ada343ab0864f79a/.github/workflows/dandi-dev.yml#L26-L35, usually minus line 34

But without line 34 (forcible pin to previous HDMF version) this results in an error in dandi/tests/test_organize.py:test_ambiguous: https://github.com/NeurodataWithoutBorders/nwbinspector/actions/runs/10712486753/job/29702975698

Unsure what the true source of the problem is though, and if its a problem to be concerned about in general, just wanted to let y'all know

yarikoptic commented 2 months ago

so you are saying that with the new 3.14.4 something what worked before starts to fail? sounds like a possible regression in HDMF ... and indeed our CI is full of red Test fails since awhile, but largely masked by facts that I was aware of some other services running Red and thus ignoring... ok, time to look into it! That test started to fail in 08 on 22nd and on dev-deps which install development versions of hdmf and pynwb (and keyring and nwbinspector) at https://github.com/dandi/dandi-cli/blob/master/.github/workflows/test.yml#L71

dandi@drogon:/mnt/backup/dandi/tinuous-logs/dandi-cli/2024/08$ git grep 'test_ambiguous FAIL' | cat
22/github/cron/20240822T060345/44e097f/Tests/5792/16_test (ubuntu-latest, 3.8, dev-deps).txt:2024-08-22T06:22:42.4039246Z dandi/tests/test_organize.py::test_ambiguous FAILED
22/github/cron/20240822T060345/44e097f/Tests/5792/test (ubuntu-latest, 3.8, dev-deps)/8_Run all tests.txt:2024-08-22T06:22:42.4039201Z dandi/tests/test_organize.py::test_ambiguous FAILED

first non-dev deps fail happened on Sep 05

dandi@drogon:/mnt/backup/dandi/tinuous-logs/dandi-cli/2024/09$ git grep 'test_ambiguous FAIL' | grep -v dev-deps
05/github/cron/20240905T060358/44e097f/Tests/5806/0_test (windows-2019, 3.8, normal).txt:2024-09-05T06:12:38.5921668Z dandi/tests/test_organize.py::test_ambiguous FAILED
05/github/cron/20240905T060358/44e097f/Tests/5806/10_test (macos-latest, 3.8, normal).txt:2024-09-05T06:10:45.2083220Z dandi/tests/test_organize.py::test_ambiguous FAILED

with hdmf

dandi@drogon:/mnt/backup/dandi/tinuous-logs/dandi-cli/2024/09$ grep 'hdmf' 05/github/cron/20240905T060358/44e097f/Tests/5806/10_test\ \(macos-latest\,\ 3.8\,\ normal\).txt 
2024-09-05T06:06:55.3316630Z Collecting hdmf!=3.5.0 (from dandi==0.63.0+11.g44e097f9)
2024-09-05T06:06:55.3363050Z   Downloading hdmf-3.14.4-py3-none-any.whl.metadata (8.8 kB)

whenever prior day had

dandi@drogon:/mnt/backup/dandi/tinuous-logs/dandi-cli/2024/09$ grep 'hdmf' 04/github/cron/*/*/Tests/*/10_test\ \(macos-latest\,\ 3.8\,\ normal\).txt 
2024-09-04T06:06:43.3920200Z Collecting hdmf!=3.5.0 (from dandi==0.63.0+11.g44e097f9)
2024-09-04T06:06:43.3957280Z   Downloading hdmf-3.14.3-py3-none-any.whl.metadata (8.8 kB)

so indeed hdmf release of changes from around aug 22 seems to blame ;)

so it is likely of those 3 changes ```shell commit e0bedca13f167d55a4be5657044c4c6697de95ca Author: Matthew Avaylon Date: Thu Aug 22 08:45:29 2024 -0700 Append a Dataset of References (#1135) commit acc3d78cc5a828ddd384cca814ef60167ae92682 Author: Steph Prince <40640337+stephprince@users.noreply.github.com> Date: Wed Aug 21 22:14:24 2024 -0700 Write scalar datasets with compound data type (#1176) * add support for scalar compound datasets * add scalar compound dset io and validation tests * update CHANGELOG.md * Update tests/unit/test_io_hdf5_h5tools.py Co-authored-by: Ryan Ly * update container repr conditionals --------- Co-authored-by: Ryan Ly commit 2b167aedc8a8f58afd75d3d0c750f6d620dc663d Author: Steph Prince <40640337+stephprince@users.noreply.github.com> Date: Wed Aug 21 13:48:42 2024 -0700 Add support to write multidimensional string arrays (#1173) * add condition for multidim string arrays * add tests for multidim string array build * update condition when defining hdf5 dataset shape * add test to write multidim string array * update CHANGELOG.md * fix text decoding in test * add recursive string type for arrays of arbitrary dim * add test for compound data type with strings * add tests for multidim str attributes * fix line lengths * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update compound dtype test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly ```

since not clear what it is -- now will look into actual test fail and see if can point more specifically the effect

yarikoptic commented 2 months ago

bisected to https://github.com/hdmf-dev/hdmf/commit/2b167aedc8a8f58afd75d3d0c750f6d620dc663d

yarikoptic commented 2 months ago

it is about

2024-09-05 14:44:24,500 [ WARNING] Completely empty record for /home/yoh/.tmp/pytest-of-yoh/pytest-786/test_ambiguous0/simple2.nwb

and related to creation of that file, not loading etc... And since it complained only about that one, and not the other one -- it is about us "copying" an nwb via NWBHDF5IO in https://github.com/dandi/dandi-cli/blob/HEAD/dandi/pynwb_utils.py#L494 .

    with pynwb.NWBHDF5IO(src, "r") as ior, pynwb.NWBHDF5IO(dest, "w") as iow:
        data = ior.read()
        data.generate_new_id()
        iow.export(ior, nwbfile=data)

as a result of that commit, we are getting a new file...

and it as 12 more of lines in the diff between diffs of h5dump between original and "copied":

--- /tmp/diff-good  2024-09-05 14:58:33.994177871 -0400
+++ /tmp/diff-bad   2024-09-05 14:58:28.530140108 -0400
@@ -1,8 +1,8 @@
---- /home/yoh/.tmp/pytest-of-yoh/pytest-787/simple20/simple2.dump  2024-09-05 14:52:36.671721691 -0400
-+++ /home/yoh/.tmp/pytest-of-yoh/pytest-787/test_ambiguous0/simple2.dump   2024-09-05 14:52:48.967804648 -0400
+--- /home/yoh/.tmp/pytest-of-yoh/pytest-785/simple20/simple2.dump  2024-09-05 14:57:09.189592043 -0400
++++ /home/yoh/.tmp/pytest-of-yoh/pytest-785/test_ambiguous0/simple2.dump   2024-09-05 14:56:55.105494805 -0400
 @@ -1,5 +1,14 @@
--HDF5 "/home/yoh/.tmp/pytest-of-yoh/pytest-787/simple20/simple2.nwb" {
-+HDF5 "/home/yoh/.tmp/pytest-of-yoh/pytest-787/test_ambiguous0/simple2.nwb" {
+-HDF5 "/home/yoh/.tmp/pytest-of-yoh/pytest-785/simple20/simple2.nwb" {
++HDF5 "/home/yoh/.tmp/pytest-of-yoh/pytest-785/test_ambiguous0/simple2.nwb" {
  GROUP "/" {
 +   ATTRIBUTE ".specloc" {
 +      DATATYPE  H5T_REFERENCE { H5T_STD_REF_OBJECT }
@@ -20,17 +20,29 @@
        }
        DATASPACE  SCALAR
        DATA {
--      (0): "8e7035b9-1bcf-4cf2-a54a-1fe81350dc21"
-+      (0): "156cfbc4-e3c9-4e01-9a68-8d2e36dbf9a1"
+-      (0): "d5c1f483-a32e-4da3-9eac-578581e5628a"
++      (0): "fb7e80c0-bd56-4c99-9266-3c3b63e479c2"
        }
     }
     GROUP "acquisition" {
+@@ -108,9 +117,9 @@
+             CSET H5T_CSET_UTF8;
+             CTYPE H5T_C_S1;
+          }
+-         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
++         DATASPACE  SCALAR
+          DATA {
+-         (0): "keyword1", "keyword 2"
++         (0): "<StrDataset for HDF5 dataset "keywords": shape (2,), type "|O">"
+          }
+       }
+       DATASET "lab" {
 @@ -183,7 +192,7 @@
              }
              DATASPACE  SCALAR
              DATA {
--            (0): "695e6d7d-c000-4a9e-bf89-a86a8fa55daf"
-+            (0): "15aa5c61-0cf1-462e-9a2e-6ce7f96afebe"
+-            (0): "02854e20-0941-4078-9151-cfc27c312d9f"
++            (0): "ff22df57-a0e5-46c2-8b60-9c25f19ce818"
              }
           }
           DATASET "date_of_birth" {

and that

❯ grep StrDataset /home/yoh/.tmp/pytest-of-yoh/pytest-785/simple20/simple2.dump /home/yoh/.tmp/pytest-of-yoh/pytest-785/test_ambiguous0/simple2.dump
/home/yoh/.tmp/pytest-of-yoh/pytest-785/test_ambiguous0/simple2.dump:         (0): "<StrDataset for HDF5 dataset "keywords": shape (2,), type "|O">"

is quite a unique "thing" to be used at all present only in the "copy" of the file. ...

and damn -- we are managing to hide away such an error message/exception from the user even at debug level somehow, but there is a problem with HDMF being unable to load such a file

(Pdb) p get_metadata(paths[0])
2024-09-05 15:05:22,983 [   DEBUG] Creating converter from 3 to 5
2024-09-05 15:05:23,032 [   DEBUG]         Calling override function for constructor argument 'age__reference'
2024-09-05 15:05:23,032 [   DEBUG]         Calling override function for constructor argument 'date_of_birth'
2024-09-05 15:05:23,044 [   DEBUG]         Calling override function for constructor argument 'session_start_time'
2024-09-05 15:05:23,044 [   DEBUG]         Calling override function for constructor argument 'file_create_date'
2024-09-05 15:05:23,045 [   DEBUG]         Calling override function for constructor argument 'timestamps_reference_time'
2024-09-05 15:05:23,045 [   DEBUG]         Calling override function for constructor argument 'experimenter'
2024-09-05 15:05:23,045 [   DEBUG]         Calling override function for constructor argument 'related_publications'
2024-09-05 15:05:23,045 [   DEBUG]         Calling override function for constructor argument 'scratch'
*** hdmf.build.errors.ConstructError: (root GroupBuilder {'attributes': {'namespace': 'core', 'neurodata_type': 'NWBFile', 'nwb_version': '2.7.0', 'object_id': 'd5e51f36-ccf4-4a5d-bfe9-0cfe2200332f'}, 'groups': {'acquisition': root/acquisition GroupBuilder {'attributes': {}, 'groups': {}, 'datasets': {}, 'links': {}}, 'analysis': root/analysis GroupBuilder {'attributes': {}, 'groups': {}, 'datasets': {}, 'links': {}}, 'general': root/general GroupBuilder {'attributes': {}, 'groups': {'subject': root/general/subject GroupBuilder {'attributes': {'namespace': 'core', 'neurodata_type': 'Subject', 'object_id': 'ad534e99-07d7-4a45-86ed-21510c5d4f1c'}, 'groups': {}, 'datasets': {'date_of_birth': root/general/subject/date_of_birth DatasetBuilder {'attributes': {}, 'data': '2016-12-01T00:00:00+00:00'}, 'sex': root/general/subject/sex DatasetBuilder {'attributes': {}, 'data': 'U'}, 'species': root/general/subject/species DatasetBuilder {'attributes': {}, 'data': 'Mus musculus'}, 'subject_id': root/general/subject/subject_id DatasetBuilder {'attributes': {}, 'data': 'mouse001'}}, 'links': {}}}, 'datasets': {'experiment_description': root/general/experiment_description DatasetBuilder {'attributes': {}, 'data': 'experiment_description1'}, 'experimenter': root/general/experimenter DatasetBuilder {'attributes': {}, 'data': <StrDataset for Closed HDF5 dataset>}, 'institution': root/general/institution DatasetBuilder {'attributes': {}, 'data': 'institution1'}, 'keywords': root/general/keywords DatasetBuilder {'attributes': {}, 'data': '<StrDataset for HDF5 dataset "keywords": shape (2,), type "|O">'}, 'lab': root/general/lab DatasetBuilder {'attributes': {}, 'data': 'lab1'}, 'related_publications': root/general/related_publications DatasetBuilder {'attributes': {}, 'data': <StrDataset for Closed HDF5 dataset>}, 'session_id': root/general/session_id DatasetBuilder {'attributes': {}, 'data': 'session_id1'}}, 'links': {}}, 'processing': root/processing GroupBuilder {'attributes': {}, 'groups': {}, 'datasets': {}, 'links': {}}, 'stimulus': root/stimulus GroupBuilder {'attributes': {}, 'groups': {'presentation': root/stimulus/presentation GroupBuilder {'attributes': {}, 'groups': {}, 'datasets': {}, 'links': {}}, 'templates': root/stimulus/templates GroupBuilder {'attributes': {}, 'groups': {}, 'datasets': {}, 'links': {}}}, 'datasets': {}, 'links': {}}}, 'datasets': {'file_create_date': root/file_create_date DatasetBuilder {'attributes': {}, 'data': <Closed HDF5 dataset>}, 'identifier': root/identifier DatasetBuilder {'attributes': {}, 'data': '43eb6df40f18426ab60ffea2ea05a4c8'}, 'session_description': root/session_description DatasetBuilder {'attributes': {}, 'data': 'session_description1'}, 'session_start_time': root/session_start_time DatasetBuilder {'attributes': {}, 'data': '2017-04-15T12:00:00+00:00'}, 'timestamps_reference_time': root/timestamps_reference_time DatasetBuilder {'attributes': {}, 'data': '2017-04-15T12:00:00+00:00'}}, 'links': {}}, "Could not construct NWBFile object due to: NWBFile.__init__: incorrect type for 'keywords' (got 'str', expected 'ndarray, list, tuple, Dataset, Array, StrDataset, HDMFDataset or AbstractDataChunkIterator')")

I will for now declare this version of hdmf buggy for dandi-cli and file and issue with HDMF.

yarikoptic commented 2 months ago

Issues filed:

github-actions[bot] commented 2 months ago

:rocket: Issue was released in 0.63.1 :rocket: