bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
274 stars 157 forks source link

[ENH] Allow plus signs in labels #1926

Open tsalo opened 1 week ago

tsalo commented 1 week ago

Closes #1165. Adds + to the valid characters for the "label" format, which is currently only used for a subset of entities.

tsalo commented 1 week ago

In this PR, plus signs are allowed for any label-format value, but I believe some folks (@effigies?) were thinking of limiting them to special multi-label cases. If we want to go down that route, then we'll need to create a new "format" (e.g., "multilabel") that applies to specific entities. I'm not 100% which entities we would want the multilabel format for... perhaps acq, desc, and space?

effigies commented 1 week ago

Reading https://github.com/bids-standard/bids-specification/issues/1165#issuecomment-1549699835 and the following discussion, I think the overall consensus was toward permissiveness; i.e., allow in any label instead of splitting out a new multi-label concept.

I feel like the discussion about "semantics" was not entirely clear. I personally feel we should refrain from implying a relation between ent-X+Y and ent-X or ent-Y that tooling would need to respect. Perhaps we want to say something like:

Free-form labels with alphanumeric and plus (+) characters. Plus signs MAY be used to concatenate multiple applicable labels, but no relationship is established by a partial match. In particular, the inheritance principle does not connect files containing ent-X+Y and ent-X or ent-Y.

tsalo commented 1 week ago

I like that!

yarikoptic commented 1 week ago

Just a sidenote FTR: There is also a question of either to allow it in suffixes (e.g. would converge DANDI layout closer (https://github.com/dandi/dandi-cli/issues/1498) but since we do not have any "incident" of that in BIDS ATM, it is not appropriate to change ATM even though in general I see that suffixes also should be of the same nature as "labels", in particular that IIRC their values could "migrate" into _mod- entity.

yarikoptic commented 1 week ago

I pushed some changes which I think are due as well, although might need tune ups -- individual commits might have more information/reasoning in the commit messages. But there is also IMHO outstanding review and possibly tune up needed to src/metaschema.json

it does have plenty of "alphanumeric" and _ allowances and I am not yet 100% sure none of them somehow overlaps with "alphanumeric" possible in filename etc... most likely none, but still worth looking at IMHO ```shell ❯ git grep 'a-zA-Z0-9[^+]' -- src/metaschema.json src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9][a-zA-Z0-9_-]*$": { src/metaschema.json: "^_[a-zA-Z0-9_-]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { "$ref": "#/definitions/suffixRule" } src/metaschema.json: "^[a-zA-Z0-9_]+$": { "$ref": "#/definitions/suffixRule" } src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "^[a-zA-Z0-9_]+$": { src/metaschema.json: "items": { "type": "string", "pattern": "^[a-zA-Z0-9]+$" } ```
tsalo commented 4 days ago

@Remi-Gau that's a great point. Do you think it makes more sense to create a new example dataset or to modify an existing one?

I took a quick look at the BIDS examples and https://github.com/bids-standard/bids-examples/tree/master/ds004332 looks promising. There are complex acq entities that seem to convey multiple discrete pieces of information. For example, acq-mpragePMCoff, acq-mpragePMCon, acq-t2starPMCoff, and acq-t2starPMCon could be changed to acq-mprage+pmc+off etc.

EDIT: If you think creating a new dataset makes more sense, then I think it would be good to use something like CUBIDS since that will use the acq entity to flag different kinds of variants (e.g., different numbers of volumes, different flip angles) in a dataset.

Remi-Gau commented 4 days ago

I am afraid that if we start adding a new dataset for every new aspect of the spec we need to validate we will end up with too many examples.

How about adding this to the synthetic example?

tsalo commented 4 days ago

That works! I'll look into modifying https://github.com/bids-standard/bids-examples/tree/master/synthetic.