bids-standard / bids-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/bids-validator/
MIT License
181 stars 109 forks source link

Adding one "Extra_Information" key on the sidecar .json file. #1571

Open Fayed-Rsl opened 1 year ago

Fayed-Rsl commented 1 year ago

As a reminder, these are the keys that are allowed by the BIDS validator for MEG files. (See this for allowed keys in the JSON file: https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/02-magnetoencephalography.html#sidecar-json-_megjson)

The problem is that I have more information that I want to add in my sidecares to each specific file recorded, but there is no keys that fit my purpose. Passing to the BIDS validator a JSON file with unallowed keys will provide the following error:

[Code 27] JSON_INVALID Not a valid JSON file.

Thus, I am constrained to create a .txt file in addition to the json sidecare (and also having a .bidsignore file to ignore every .txt extension), to enter my extra information in the form of a dictionnary. I found this solution rather unappealing. I would prefer to add my information immediately into the JSON file.

I think allowing the optional key "ExtraInformation"where people are free to give more information on specific acquisition and task settings regarding the recording would be really nice if the BIDS validator wouldn't raise an error. I suggest the data type of the key to be: a string / array of string / dictionnary.

effigies commented 1 year ago

I would be inclined to permit additional properties in general, as we do in MRI. I think the idea was to make it harder to make uncaught typos, but this was always the other side of that coin.

I don't know how feasible it would be to collect these additional fields and emit warnings that suggest known metadata fields that are "close" to the unknown fields.

sappelhoff commented 1 year ago

I think the idea was to make it harder to make uncaught typos, but this was always the other side of that coin.

yes, and to some extend also preventing users to go wild on adding undocumented metadata (i.e., metadata that may be hard to interpret because it's not described in the spec), and instead nudging them to open issues on bids-specification to discuss general solutions (like "official" metadata fields in the JSON).

@Fayed-Rsl what data do you want to add?

I would be inclined to permit additional properties in general, as we do in MRI.

I would not be strictly opposed to that.

I don't know how feasible it would be to collect these additional fields and emit warnings that suggest known metadata fields that are "close" to the unknown fields.

+1, sounds like a good idea to me

Fayed-Rsl commented 1 year ago

Let's take this json file as an example:

{ "TaskName": "Hold", "Manufacturer": "Elekta", "PowerLineFrequency": 50.0, "SamplingFrequency": 2000.0, "SoftwareFilters": { "SpatialCompensation": { "GradientOrder": 0 } }, "RecordingDuration": 724.4995, "RecordingType": "continuous", "DewarPosition": "n/a", "DigitizedLandmarks": true, "DigitizedHeadPoints": false, "MEGChannelCount": 306, "MEGREFChannelCount": 0, "ContinuousHeadLocalization": true, "HeadCoilFrequency": [ 293.0, 307.0, 314.0, 321.0 ], "EEGChannelCount": 8, "EOGChannelCount": 2, "ECGChannelCount": 0, "EMGChannelCount": 4, "MiscChannelCount": 0, "TriggerChannelCount": 3 }

Here my main idea was to add after TriggerChannelCount, a new key named "ExtraInformation" for example, such as:

...
"TriggerChannelCount": 3
"ExtraInformation": {
    "Information1": "Note that the recording was ...",
    "Information2": "Please take in account that ...",
    "Information3": {
        "Information3.1": "The subject didn't ...",
        "Target1": "This particular parameter was at this frequency ..."
    }
}

}

Of course, the keys within the ExtraInformation should remain free to the user, and only the "ExtraInformation" key would be the one that is checked within the BIDS validator. On the BIDS specification it would be something like this: Key name: ExtraInformation Requirement Level: Optional Data type: string | string of array | dictionnary Description: Additional information regarding the recording.

sappelhoff commented 1 year ago

I see, thanks! Why wouldn't you want to put this (seemingly free text, general info) into the README of the dataset?

The danger with putting custom metadata into BIDS JSON files is that automated tools likely won't know what to do with this and in the best case will only alert users that there is some unknown metadata in the JSON files. Users are more likely to read the README, than to dig through JSON sidecars, in my experience :thinking:

Fayed-Rsl commented 1 year ago

I see, thanks! Why wouldn't you want to put this (seemingly free text, general info) into the README of the dataset?

Because these information and parameters changes for each subject and recording session depending on the type of session | task | acquisition.

ps: Thank you for your quick answers.

sappelhoff commented 1 year ago

@bids-standard/maintainers and also @robertoostenveld (with whom I imagine I talked about this many months ago) what do you think of allowing arbitrary metadata fields in sidecar JSON files? In MRI this is already permitted, in ephys, it is not:

https://github.com/bids-standard/bids-validator/blob/d3d4439750d0c89e7575072d59847fa90cc32e10/bids-validator/validators/json/schemas/eeg.json#L86

--> https://json-schema.org/understanding-json-schema/reference/object.html#additional-properties

effigies commented 1 year ago

Burying these notes in the JSON sidecar does not seem very helpful, IMO. If I were looking for a place to put notes at the file level, I would probably want them condensed as a column in scans.tsv, which would allow someone receiving the data to easily see when a note was consistent across files or different.

effigies commented 1 month ago

The schema validator consistently checks metadata across all files. Unknown fields are ignored; missing required fields produce errors and missing recommended fields produce warnings.