bids-standard / bids-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/bids-validator/
MIT License
180 stars 109 forks source link

Path Issues; Validator Unable to Locate Files or Folders if Unwise Characters Present in Folder Name #1495

Open bendhouseart opened 2 years ago

bendhouseart commented 2 years ago

Recently was running the validator on a dataset as one does, but after renaming the dataset to better reflect the accompanying paper I began to run into issues w/ the validator not locating common files, subject folders, etc..

Path name is attached along with the output from running the validator.

Renaming the path to something more sensible seemed to ameliorate the issue, additionally it appears that OpenNeuro did its own version of sanitizing the errant path name after uploading. Should have the dataset published/snapshotted, but let me know if you would like me to send you a reviewer link or add you onto the dataset there in the immediate.

Bug can be reproduced by renaming a valid bids dataset to the folder name below.


Validator Version: 1.9.7

Folder Name: First-in-human_evaluation_of_[11C]PS13_a_novel_PET_radioligand_to_quantify_cyclooxygenase-1_in_the_brain user

Folder Structure:

machine:Data user$ tree First-in-human_evaluation_of_\[11C\]PS13_a_novel_PET_radioligand_to_quantify_cyclooxygenase-1_in_the_brain/
First-in-human_evaluation_of_[11C]PS13_a_novel_PET_radioligand_to_quantify_cyclooxygenase-1_in_the_brain/
├── LICENSE
├── README
├── dataset_description.json
├── participants.json
├── participants.tsv
├── sub-PS11
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS11_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS11_ses-baselinebrain_rec-Dyn_pet.json
│   │       └── sub-PS11_ses-baselinebrain_rec-Dyn_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS11_ses-rescanbrain_rec-Dyn_pet.json
│           └── sub-PS11_ses-rescanbrain_rec-Dyn_pet.nii.gz
├── sub-PS17
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS17_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS17_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS17_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS17_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS17_ses-rescanbrain_rec-DynTOF_pet.nii.gz
├── sub-PS19
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS19_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS19_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS19_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS19_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS19_ses-rescanbrain_rec-DynTOF_pet.nii.gz
├── sub-PS20
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS20_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS20_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS20_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS20_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS20_ses-rescanbrain_rec-DynTOF_pet.nii.gz
├── sub-PS21
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS21_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS21_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS21_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS21_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS21_ses-rescanbrain_rec-DynTOF_pet.nii.gz
├── sub-PS23
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS23_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS23_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS23_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS23_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS23_ses-rescanbrain_rec-DynTOF_pet.nii.gz
├── sub-PS24
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS24_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS24_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS24_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS24_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS24_ses-rescanbrain_rec-DynTOF_pet.nii.gz
├── sub-PS26
│   ├── ses-baselinebrain
│   │   ├── anat
│   │   │   └── sub-PS26_ses-baselinebrain_T1w.nii.gz
│   │   └── pet
│   │       ├── sub-PS26_ses-baselinebrain_rec-DynTOF_pet.json
│   │       └── sub-PS26_ses-baselinebrain_rec-DynTOF_pet.nii.gz
│   └── ses-rescanbrain
│       └── pet
│           ├── sub-PS26_ses-rescanbrain_rec-DynTOF_pet.json
│           └── sub-PS26_ses-rescanbrain_rec-DynTOF_pet.nii.gz
└── sub-PS39
    ├── ses-baselinebrain
    │   ├── anat
    │   │   ├── sub-PS39_ses-baselinebrain_T1w.json
    │   │   └── sub-PS39_ses-baselinebrain_T1w.nii.gz
    │   └── pet
    │       ├── sub-PS39_ses-baselinebrain_rec-DynTOF_pet.json
    │       └── sub-PS39_ses-baselinebrain_rec-DynTOF_pet.nii.gz
    └── ses-rescanbrain
        └── pet
            ├── sub-PS39_ses-rescanbrain_rec-DynTOF_pet.json
            └── sub-PS39_ses-rescanbrain_rec-DynTOF_pet.nii.gz

54 directories, 51 files

Output from Validator:

machine:First-in-human_evaluation_of_[11C]PS13_a_novel_PET_radioligand_to_quantify_cyclooxygenase-1_in_the_brain user$ bids-validator .
    Please visit https://neurostars.org/search?q=NOT_INCLUDED for existing conversations about this issue.

    2: [ERR] There are no subject folders (labeled "sub-*") in the root of this dataset. (code: 45 - SUBJECT_FOLDERS)

    Please visit https://neurostars.org/search?q=SUBJECT_FOLDERS for existing conversations about this issue.

    3: [ERR] The compulsory file /dataset_description.json is missing. See Section 03 (Modality agnostic files) of the BIDS specification. (code: 57 - DATASET_DESCRIPTION_JSON_MISSING)

    Please visit https://neurostars.org/search?q=DATASET_DESCRIPTION_JSON_MISSING for existing conversations about this issue.

    4: [ERR] Subject label in the filename doesn't match with the path of the file. File seems to be saved in incorrect subject directory. (code: 64 - SUBJECT_LABEL_IN_FILENAME_DOESNOT_MATCH_DIRECTORY)
        ./ses-baselinebrain/anat/sub-PS11_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS11_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS17_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS17_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS19_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS19_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS20_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS20_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS21_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS21_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS23_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS23_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS24_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS24_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS26_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS26_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.json
            Evidence: File: /ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.json is saved in incorrect subject directory as per sub-id in filename.
        ./ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.nii.gz is saved in incorrect subject directory as per sub-id in filename.
        ... and 36 more files having this issue (Use --verbose to see them all).

    Please visit https://neurostars.org/search?q=SUBJECT_LABEL_IN_FILENAME_DOESNOT_MATCH_DIRECTORY for existing conversations about this issue.

    5: [ERR] Session label in the filename doesn't match with the path of the file. File seems to be saved in incorrect session directory. (code: 65 - SESSION_LABEL_IN_FILENAME_DOESNOT_MATCH_DIRECTORY)
        ./ses-baselinebrain/anat/sub-PS11_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS11_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS17_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS17_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS19_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS19_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS20_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS20_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS21_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS21_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS23_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS23_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS24_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS24_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS26_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS26_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.json
            Evidence: File: /ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.json is saved in incorrect session directory as per ses-id in filename.
        ./ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.nii.gz
            Evidence: File: /ses-baselinebrain/anat/sub-PS39_ses-baselinebrain_T1w.nii.gz is saved in incorrect session directory as per ses-id in filename.
        ... and 36 more files having this issue (Use --verbose to see them all).

    Please visit https://neurostars.org/search?q=SESSION_LABEL_IN_FILENAME_DOESNOT_MATCH_DIRECTORY for existing conversations about this issue.

    1: [WARN] The recommended file /README is missing. See Section 03 (Modality agnostic files) of the BIDS specification. (code: 101 - README_FILE_MISSING)

    Please visit https://neurostars.org/search?q=README_FILE_MISSING for existing conversations about this issue.

        Summary:                Available Tasks:        Available Modalities:
        46 Files, 2.73GB
        0 - Subjects
        1 - Session

    If you have any questions, please post on https://neurostars.org/tags/bids.
effigies commented 2 weeks ago

This is almost certainly a problem with regular expressions, so I don't think this should affect the schema validator. Please add the schema label if it's still a problem.