bids-standard / pybids

Python tools for querying and manipulating BIDS datasets.
https://bids-standard.github.io/pybids/
MIT License
217 stars 117 forks source link

MacOS Finder `._*` hidden metadata files cause pybids to crash #1069

Open shnizzedy opened 1 month ago

shnizzedy commented 1 month ago

Example

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start bytewhile trying to decode JSON from file […]/._sub-PA069_ses-V1W1_task-poke_run-2_bold.json ```Python Traceback (most recent call last): File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 303, in load_json return json.load(handle) File "/opt/conda/envs/sdcflows/lib/python3.10/json/__init__.py", line 293, in load return loads(fp.read(), File "/opt/conda/envs/sdcflows/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/envs/sdcflows/bin/sdcflows", line 8, in sys.exit(main()) File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/cli/main.py", line 39, in main parse_args(argv) File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/cli/parser.py", line 281, in parse_args config.from_dict(vars(opts)) File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/config.py", line 589, in from_dict execution.load(settings) File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/config.py", line 249, in load cls.init() File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/sdcflows/config.py", line 476, in init cls._layout = BIDSLayout( File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/layout.py", line 177, in __init__ _indexer(self) File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 154, in __call__ self._index_metadata() File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 415, in _index_metadata file_md.update(pl()) File "/opt/conda/envs/sdcflows/lib/python3.10/site-packages/bids/layout/index.py", line 305, in load_json raise OSError( OSError: Error occurred while trying to decode JSON from file /ocean/projects/med220004p/shared/data_raw/vannucci/bids_raw/sub-PA069/ses-V1W1/func/._sub-PA069_ses-V1W1_task-poke_run-2_bold.json ```

Proposed Solution

I think these types of files (._* and .DS_Store) can be safely ignored.

Context

In analyzing someone else's read-only (to me) data, I hit this issue. I worked around it by creating a symlinked recreation of the data directory without the MacOS hidden metadata files, but I don't think that should have been necessary.

bids-validator raised errors and warnings for the dataset, but none related to these hidden metadata files as far as I can tell:

1: [ERR] Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder. (code: 1 - NOT_INCLUDED)
      ./sub-PA028/ses-V2W2/files.txt
          Evidence: files.txt
      ./sub-PA070/ses-V1W1/anat/sub-PA070_ses-V2W2_acq-MPR_rec-vNavNorm_T1w.nii.gz
          Evidence: sub-PA070_ses-V2W2_acq-MPR_rec-vNavNorm_T1w.nii.gz
2: [ERR] 'IntendedFor' field needs to point to an existing file. (code: 37 - INTENDED_FOR)
3: [ERR] You have to define 'TaskName' for this file. (code: 50 - TASK_NAME_MUST_DEFINE)
4: [ERR] Session label in the filename doesn't match with the path of the file. File seems to be saved in incorrect session directory. (code: 65 - SESSION_LABEL_IN_FILENAME_DOESNOT_MATCH_DIRECTORY)
5: [ERR] _T1w.nii[.gz] files must have exactly three dimensions.  (code: 95 - T1W_FILE_WITH_TOO_MANY_DIMENSIONS)
1: [WARN] Task scans should have a corresponding events.tsv file. If this is a resting state scan you can ignore this warning or rename the task to include the word "rest". (code: 25 - EVENTS_TSV_MISSING)
2: [WARN] Not all subjects contain the same files. Each subject should contain the same number of files with the same naming unless some files are known to be missing. (code: 38 - INCONSISTENT_SUBJECTS)
3: [WARN] Not all subjects/sessions/runs have the same scanning parameters. (code: 39 - INCONSISTENT_PARAMETERS)
4: [WARN] NIfTI file's header field for pixel dimension information empty or too short. (code: 42 - NIFTI_PIXDIM)
5: [WARN] There are files in the /stimuli directory that are not utilized in any _events.tsv file. (code: 77 - UNUSED_STIMULUS)
6: [WARN] Tabular file contains custom columns not described in a data dictionary (code: 82 - CUSTOM_COLUMN_WITHOUT_DESCRIPTION)
7: [WARN] The onset of the last event is after the total duration of the corresponding scan. This design is suspiciously long.  (code: 85 - SUSPICIOUSLY_LONG_EVENT_DESIGN)
8: [WARN] Not all subjects contain the same sessions. (code: 97 - MISSING_SESSION)
9: [WARN] The recommended file /README is missing. See Section 03 (Modality agnostic files) of the BIDS specification. (code: 101 - README_FILE_MISSING)
10: [WARN] The Authors field of dataset_description.json should contain an array of fields - with one author per field. This was triggered because there are no authors, which will make DOI registration from dataset metadata impossible. (code: 113 - NO_AUTHORS)

I get the same errors and warnings in my workaround data directory but avoid the issue in PyBIDS.

effigies commented 2 weeks ago

I agree that we should generally ignore dotfiles.