bids-standard / bids-2-devel

Discussions and suggestions of backwards incompatible changes to BIDS
https://bids.neuroimaging.io/
Creative Commons Attribution 4.0 International
10 stars 1 forks source link

Allow for using an arbitrary suffix and/or arbitrary entity #63

Open TheChymera opened 4 months ago

TheChymera commented 4 months ago

As mentioned in this issue there are a lot of quasi-BIDS standards. In fact, given that most people's data almost fits into BIDS, I think quasi-BIDS might be more popular than BIDS.

It has been in any case for me, just 2 examples:

  1. Suffixes: I ended up using _cbv long before we got it merged because that's what my data was.
  2. Key-value pairs: I'm currently working on this data which needs new key-value pairs that don't mesh well, not just with current BIDS, but also not with upcoming BIDS. It will thus remain quasi-BIDS for the foreseeable future.

I think the first problem arises from the fact that suffix is currently a whitelist term. It's simply not conceivable that the “suffix” values we record will cover all possibly relevant values. It is however conceivable — given the rapid development of neuroimaging experiments — that even if we cover all relevant values, new values will emerge long before we have a chance to add them. Same goes for key-value pairs, where values are indeed flexible, but keys are again whitelisted.

What I would propose is to increase flexibility by allowing the injection of custom key-value pairs, with the standard specifying a way to document them in the dataset_description.json. The same could be done for suffix, if we are to keep it as a special entity, where there are certain pre-defined keys, but ultimately the keys are free-form if they have an entry in dataset_description.json.

I know what you're thinking, it sounds like chaos, but from the usability point of view, keys are equivalent (well, maybe not run). Ultimately you just filter by them to create categories for comparisons. Allowing users to add a new custom category for their data would provide a more natural and lightweight method for key adoption than it being tied to a BEP. It would also prevent people from constantly having the impulse to overload predefined key-value pairs, particularly acq- attracts a lot of that, but I've also seen it for task-. This would make the keys that we do specify less prone to abuse and more standard.

As BIDS becomes bigger and more widely adopted I also expect the BEP process to become increasingly lengthy and complicated for newcomers, so it would be nice to have built-in “extension” support in the actual dataset. No pull/merge/delay involved.

yarikoptic commented 4 months ago

"Make BIDS-2 more flexible" is not specific enough. Many (if not all) issues filed here are already about that topic. To me it sounds like

Allow for using an arbitrary suffix and/or arbitrary entity

which I will rename this issue to now. And indeed from current description it sounds like "chaos" and to the degree of desired effect here is already possible since if someone does not care about standardizing (in effect what whitelisting within standard does) - they can just use them and add to .bidsignore to pacify the validator. The point is that nobody besides selected few would know how to treat those ad-hoc suffixes/entities. But ok -- let's use aforementioned suggested name for this method. And "chaos" could be reduced as such specification could "complement" BIDS schema and thus BIDS validator should be able to take advantage of those additional suffixes/entries, so they would not be ignored but validated against, and tools/queries (if tools support flexible suffix/entities handling) would be able to handle them.

But overall - it might be a large effort to formalize such specification. It is pretty much on par with "Allow for a list of BIDS Extensions to augment schema with" approach below in that it would require a "language" flexible enough to specify them - across different data types, etc. "Allow to point to a forked BIDS schema" might be much easier to achieve but would be more fragile.

Alternatives/Complimentary approaches:

The whole point behind BEPs and me facilitating smaller atomic changes such as PRs for adding individual entities (ref: https://github.com/bids-standard/bids-specification/issues/371) is to standardize the semantic behind those added suffixes/entities.

But then we can parallel to

Now that we have BIDS schema, it sounds feasible to indeed formalize some way to establish specification of either "Forked BIDS" or "BIDS with extensions".

Allow to point to a forked BIDS schema

You must remember how we used that custom BIDSVersion with a branch of the modified schema in DANDI. So this could be one way to allow a "forked BIDS": where e.g. BIDSVersion would point to a forked schema prefix. Then tools would need to be able to download and use that schema instead of the one identified by a version string. But it would still limit applicability since it would not be clear (to a human or a machine) on what those changes in the fork are. Kinda along with the concern I raised in https://github.com/ome/ngff/issues/228#issuecomment-1960434309 where the idea is to point to a zarr extension with a versioned URL, but thus not revealing "semantic" of that target.

Allow for a list of BIDS Extensions to augment schema with

Work out on how to provide "overlay"/"extension" to BIDS schema and then have something like BIDSExtensions: [...] which would allow for an overlay on top of current (specified in BIDSVersion) schema. That is somewhat more inline with NWB extensions, which even go further and store that schema within NWB file itself.

TheChymera commented 4 months ago

.bidsignore [vs.] thus BIDS validator should be able to take advantage of those additional suffixes/entries, so they would not be ignored but validated against

Yes, that's what I meant. .bidsignore just means “this is not BIDS”, which is useful to keep, but it would be good to allow additional key-value pairs to be validated and parsed.


The problem I see with these two approaches you suggested (and incidentally with NWB extensions):

Allow for a list of BIDS Extensions to augment schema with

Allow to point to a forked BIDS schema

is that now you need to keep track of not only your data, but the corresponding schema directory, and of course a validator that can parse it. This is the validator version vs. BIDS version vs. schema version discussion again, but with the added complexity of plug-in management/compatibility (what if two schema extensions conflict?) on top of that.

In a sense the reason why we have a schema at all is to manage the logic of what entries are allowed where. Following the reference re-write (see the lines starting with $ref here), the schema has become more parsimonious and intransparent for human reading.

So if anything the concern of intransparency

The point is that nobody besides selected few would know how to treat those ad-hoc suffixes/entities.

would apply to schema forking and schema extensions more than to custom key injection. With respect to the “user”, i.e. analyzer of the data, ultimately keys are just categories for analysis, which can have a human-readable blurb, like everything else does.

A simple specification for adding custom keys in a dataset, would be easy to produce for anybody willing to look up an example — and easy to interpret for anybody willing to open the description text file:

{
  "Name": "The mother of all experiments",
  "BIDSVersion": "2.0.0",
  "DatasetType": "raw",
  "License": "CC0",
  "Authors": [
    "Paul Broca",
    "Carl Wernicke"
  ],
  "CustomKeys": [
    {
      "Shorthand": "seed",
      "Description": "Injection site for connectivity tracer",
      "Directory": true
    }
  ],
}

Since the key is dataset-specific, the schema logic is unnecessary, so I see no need to overcomplicate this specification process by requiring it to be encoded it in a derived schema.

yarikoptic commented 4 months ago

as is this would be "too open" since would not bind to specific modalities etc, not stating the order (ok - could be added last; or have "AfterEntity" to specify which entity to go after) among entities, which level the Directory created on, although might play well with

So indeed would be more useful for humans than just ignoring in .bidsignore, but it might need more thought on specification (note there is no Keys, there is Entities) to become viable.

TheChymera commented 4 months ago

but it might need more thought on specification

Ok, I'd be happy to think it out more.

could be added last; or have "AfterEntity" to specify which entity to go after

Is explicit ordering going to be part of BIDS2? If not, AfterEntitiy is not needed.

I always thought “entity” refers to the “key”+“value” pair, but ok, let's fix that. Anything else we should maybe include in the following?

{
  "Name": "The mother of all experiments",
  "BIDSVersion": "2.0.0",
  "DatasetType": "raw",
  "License": "CC0",
  "Authors": [
    "Paul Broca",
    "Carl Wernicke"
  ],
  "CustomEntities": [
    {
      "Shorthand": "seed",
      "Description": "Injection site for connectivity tracer",
      "Directory": true,
    }
  ],
}
yarikoptic commented 4 months ago

Is explicit ordering going to be part of BIDS2?

I don't think there is any incentive so far to get rid of it, but actually looking into how to implement #54 might add an aspect on that.

As for if anything else -- I think it would be valuable and useful for BEP032 if you looked into contributing to

on some entity which is to be added in BEP032 (if any) to see what it "takes" - ie what is there besides order (e.g. specification which data types it applies, etc.) And that as a result would guide you to provide a complete spec above.