bids-standard / bids-2-devel

Discussions and suggestions of backwards incompatible changes to BIDS
https://bids.neuroimaging.io/
Creative Commons Attribution 4.0 International
11 stars 1 forks source link

Remove the suffix and exclusively use entities in the filename. #58

Open robertoostenveld opened 1 year ago

robertoostenveld commented 1 year ago

The discussion https://github.com/bids-standard/bids-specification/issues/1602 shows that there is no universal agreement on when some information is to be coded as a suffix (at the end of the filename just prior to the extension, e.g., _bold) or as an entity (like <key>-<value>).

I propose to remove that source of conflict in BIDS 2 by removing the suffix altogether. To me the suffix serves the same purpose as the value in an entity, except that the name has been left out. I.e., I propose that _bold.nii.gz were to become _suffix-bold.nii.gz. Instead of suffix, another name (or names) could be given to these entities.

The consequence would be that the whole filename up to the first period . (which indicates the start of the file extension, see *) can be parsed on the underscore to separate entities, and each entity can be parsed on the dash to split its name and value.

*) The file extension (e.g., .tsv, .h5, .nii.gz) would remain as it is and provide information about how the file is technically to be parsed as an ascii and/or binary stream.

oesteban commented 1 year ago

As I mentioned in the other thread, this inevitably leads to discussing that the new entity suffix- if you will, is given priority to appear the last. And with that, a strict definition of the ordering of entities.

In my opinion, removing the suffix damages human readability with a meager return.

robertoostenveld commented 1 year ago

The BIDS standard already specifies a strict ordering of the entities, and I am not proposing to change that.

The ordering of entities, and whether each is OPTIONAL, REQUIRED, or MUST NOT be specified for a given file type, is specified in the Entity Table.

oesteban commented 1 year ago

The BIDS standard already specifies a strict ordering of the entities, and I am not proposing to change that.

The ordering of entities, and whether each is OPTIONAL, REQUIRED, or MUST NOT be specified for a given file type, is specified in the Entity Table.

Sure, that's manageable for BIDS "raw". But the problem scales with the number of entities, and BIDS Derivatives is set out to define a fair bunch.

The discussion bids-standard/bids-specification#1602 shows that there is no universal agreement on when some information is to be coded as a suffix (at the end of the filename just prior to the extension, e.g., _bold) or as an entity (like <key>-<value>).

That proposal does not point at such a problem, only the discussion after it could be interpreted in that way. This proposal (i.e., removing the suffix) does not describe what it is solving. It just opens some flexibility with two goals:

  1. to add a name to the suffix entity so that strong opinions can be tempered and say, "if you don't like modality-, we create one more suffix-like entity (i.e., last required) that is of your liking, what about sampling-"; and
  2. the suffix is not under a controlled vocabulary anymore, so the user has total flexibility over what goes last.

I think (1) is just a countermeasure to open space of agreement on a problem we currently don't have, and (2) leads to total flexibility that will require additional metadata to describe the dataset. (2) is not theoretically a bad idea, but I would honestly move into other alternatives (I said NIDM a bunch of times) with more programmatic and reliable foundations to describe the data. BIDS should offer something easy-to-use and highly readable for humans.

yarikoptic commented 1 year ago

In general I agree with the motivation for the change. I would only vote to not add again semantically meaningless _suffix- but see to which entities current values would need to be mapped, and start from looking at current ones and provide such a mapping at least for a good portion of them. But it would require some thought about semantic meaningful entities. FTR -- ATM we seems to have 103 suffixes within suffixes.yaml. _mod- could have absorbed T1w, inplaneT1 since that is where currently we specify for those suffixes to be placed when creating a derived (e.g. _defacemask) image. But something like _defacemask and _mask would then not be fitting _mod. What would that be?

oesteban commented 1 year ago

In general I agree with the motivation for the change.

And what is that motivation? I truly don't know what it is.

yarikoptic commented 1 year ago

ATM suffix has no clear semantic meaning. ATM it aims to be a "human accessible term best describing what is in the file", values for which is a mix of

I think it is as a result of this absent semantical clarity, while contemplating new "suffixes" it becomes unclear what should go into the suffix vs some other entity - should a new suffix be created or an entity be created, or a mix of the two, etc. And that is what I think prompted @robertoostenveld to file this issue. In my memory I remember us stumbling on how to formalize naming of derived files, and that is how IIRC _mod for _mod-T1w_defacemask was born since we had to place existing suffix somewhere.

TheChymera commented 7 months ago

@oesteban

And with that, a strict definition of the ordering of entities.

Don't we already have this? I've always seen subject and session first, or is this slated to be removed in BIDS2?

yarikoptic commented 7 months ago

yes, we have clear ordering and AFAIK always had so far. What to be done for BIDS 2.0 or either there would be effect from

is yet to be decided about. Not sure what @oesteban had in mind while talking about derivatives since, as @robertoostenveld pointed out the order is universal across modalities and specified in https://github.com/bids-standard/bids-specification/blob/master/src/schema/rules/entities.yaml . Note that _mod which is the closest somewhat in possibly absorbing the suffix, is in the middle of the ordering.