NeurodataWithoutBorders / nwbinspector

Tool to help inspect NWB files for compliance with NWB Best Practices
https://nwbinspector.readthedocs.io/
Other
17 stars 10 forks source link

[Bug]: File count in output includes directories #359

Open rly opened 1 year ago

rly commented 1 year ago

What happened?

When the check_unique_identifiers issue is raised, the directory is counted as a "file" in the output line Found 21 issues over 3 files. This can be confusing to users (see comment https://github.com/dandi/dandi-cli/issues/1266#issuecomment-1498924935). I'm not sure if there are other directory-level issues like this.

Suggestion: change Found 21 issues over 3 files to Found 21 issues over 2 files and 1 directory or Found 21 issues over 3 files and directories

**************************************************
NWBInspector Report Summary

Timestamp: 2023-04-11 13:52:44.461154-07:00
Platform: macOS-13.2.1-arm64-i386-64bit
NWBInspector version: 0.4.27

Found 21 issues over 3 files:
       1 - CRITICAL
      10 - BEST_PRACTICE_VIOLATION
      10 - BEST_PRACTICE_SUGGESTION
**************************************************

0  CRITICAL
===========

0.0  /Users/rly/Documents/NWB_Data/dandisets/000037/sub-408021: check_unique_identifiers - 'NWBFile' object at location '/'
       Message: The identifier '758519303_with_stim' is used across the .nwb files: ['sub-408021_ses-758519303_behavior+image+ophys.nwb', 'sub-408021_ses-758519303_behavior+image+ophys_new_IDs.nwb']. The identifier of any NWBFile should be a completely unique value - we recommend using uuid4 to achieve this.

Operating System

M1 macOS

Python Version

3.11

Were you streaming with ROS3?

No

Package Versions

Copy and paste your output here

Code of Conduct

CodyCBakerPhD commented 1 year ago

Can you please attach a .txt. or .json of the full report? You may have to use the --detailed flag as well to prevent aggregation

I'm having trouble reproducing the issue - the number of files displayed in that header is simply the number of unique NWB files that had messages returned: https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/src/nwbinspector/inspector_tools.py#L118

So what I'm looking for in the full report is all the messages from all the files to confirm they appear as expected

I could technically add some directory logic to the string (like evaluate the unique parents of each file) but TBH I don't think it's all that helpful; the inspector and reports have no notion of sub-directories (or even memory of the top-level directory), only files, and files are found by recursive globbing from the main CLI calls