Open robertoostenveld opened 1 year ago
As I mentioned in the other thread, this inevitably leads to discussing that the new entity suffix-
if you will, is given priority to appear the last. And with that, a strict definition of the ordering of entities.
In my opinion, removing the suffix damages human readability with a meager return.
The BIDS standard already specifies a strict ordering of the entities, and I am not proposing to change that.
The ordering of entities, and whether each is OPTIONAL, REQUIRED, or MUST NOT be specified for a given file type, is specified in the Entity Table.
The BIDS standard already specifies a strict ordering of the entities, and I am not proposing to change that.
The ordering of entities, and whether each is OPTIONAL, REQUIRED, or MUST NOT be specified for a given file type, is specified in the Entity Table.
Sure, that's manageable for BIDS "raw". But the problem scales with the number of entities, and BIDS Derivatives is set out to define a fair bunch.
The discussion bids-standard/bids-specification#1602 shows that there is no universal agreement on when some information is to be coded as a suffix (at the end of the filename just prior to the extension, e.g.,
_bold
) or as an entity (like<key>-<value>
).
That proposal does not point at such a problem, only the discussion after it could be interpreted in that way. This proposal (i.e., removing the suffix) does not describe what it is solving. It just opens some flexibility with two goals:
modality-
, we create one more suffix-like entity (i.e., last required) that is of your liking, what about sampling-
"; and I think (1) is just a countermeasure to open space of agreement on a problem we currently don't have, and (2) leads to total flexibility that will require additional metadata to describe the dataset. (2) is not theoretically a bad idea, but I would honestly move into other alternatives (I said NIDM a bunch of times) with more programmatic and reliable foundations to describe the data. BIDS should offer something easy-to-use and highly readable for humans.
In general I agree with the motivation for the change. I would only vote to not add again semantically meaningless _suffix-
but see to which entities current values would need to be mapped, and start from looking at current ones and provide such a mapping at least for a good portion of them. But it would require some thought about semantic meaningful entities. FTR -- ATM we seems to have 103 suffixes within suffixes.yaml. _mod-
could have absorbed T1w
, inplaneT1
since that is where currently we specify for those suffixes to be placed when creating a derived (e.g. _defacemask
) image. But something like _defacemask
and _mask
would then not be fitting _mod
. What would that be?
In general I agree with the motivation for the change.
And what is that motivation? I truly don't know what it is.
ATM suffix has no clear semantic meaning. ATM it aims to be a "human accessible term best describing what is in the file", values for which is a mix of
_T1w
, ...)_eeg
but not some "bands" within eeg modality),_defacemask
)_dseg
), _???map
s per now formalized guidelinesI think it is as a result of this absent semantical clarity, while contemplating new "suffixes" it becomes unclear what should go into the suffix vs some other entity - should a new suffix be created or an entity be created, or a mix of the two, etc. And that is what I think prompted @robertoostenveld to file this issue.
In my memory I remember us stumbling on how to formalize naming of derived files, and that is how IIRC _mod
for _mod-T1w_defacemask
was born since we had to place existing suffix somewhere.
@oesteban
And with that, a strict definition of the ordering of entities.
Don't we already have this? I've always seen subject and session first, or is this slated to be removed in BIDS2?
yes, we have clear ordering and AFAIK always had so far. What to be done for BIDS 2.0 or either there would be effect from
is yet to be decided about. Not sure what @oesteban had in mind while talking about derivatives since, as @robertoostenveld pointed out the order is universal across modalities and specified in https://github.com/bids-standard/bids-specification/blob/master/src/schema/rules/entities.yaml . Note that _mod
which is the closest somewhat in possibly absorbing the suffix, is in the middle of the ordering.
The discussion https://github.com/bids-standard/bids-specification/issues/1602 shows that there is no universal agreement on when some information is to be coded as a suffix (at the end of the filename just prior to the extension, e.g.,
_bold
) or as an entity (like<key>-<value>
).I propose to remove that source of conflict in BIDS 2 by removing the suffix altogether. To me the suffix serves the same purpose as the value in an entity, except that the name has been left out. I.e., I propose that
_bold.nii.gz
were to become_suffix-bold.nii.gz
. Instead ofsuffix
, another name (or names) could be given to these entities.The consequence would be that the whole filename up to the first period
.
(which indicates the start of the file extension, see *) can be parsed on the underscore to separate entities, and each entity can be parsed on the dash to split its name and value.*) The file extension (e.g.,
.tsv
,.h5
,.nii.gz
) would remain as it is and provide information about how the file is technically to be parsed as an ascii and/or binary stream.