bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
267 stars 154 forks source link

[ENH] Clarify the relation of motion.tsv columns to channels.tsv rows #1699

Closed effigies closed 4 months ago

effigies commented 5 months ago

The motion spec does not say whether there is a header column or not, only that the columns must match the rows of channels.tsv. From unofficial communications, I believe the intent is that motion.tsv files do not have column headers, and channels.tsv defines the column names.

This changes it to state explicitly that there are no column headers, and that the ordering is defined by the rows in channels.tsv.

cc @sjeung @JuliusWelzel

effigies commented 5 months ago

This relates to https://github.com/bids-standard/bids-validator/issues/1889.

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (fd6612e) 87.97% compared to head (83c52d6) 87.97%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1699 +/- ## ======================================= Coverage 87.97% 87.97% ======================================= Files 16 16 Lines 1356 1356 ======================================= Hits 1193 1193 Misses 163 163 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Remi-Gau commented 5 months ago

Hum... I am sorry I missed that but I had assumed that those TSV had column headers given our common principles about tabular data:

https://bids-specification.readthedocs.io/en/stable/common-principles.html#tabular-files

Each TSV file MUST start with a header line listing the names of all columns (with the exception of physiological and other continuous recordings).

effigies commented 5 months ago

Hum... I am sorry I missed that but I had assumed that those TSV had column headers given our common principles about tabular data:

https://bids-specification.readthedocs.io/en/stable/common-principles.html#tabular-files

Each TSV file MUST start with a header line listing the names of all columns (with the exception of physiological and other continuous recordings).

Yeah, in retrospect I wish we'd pushed a bit to make them either standard TSV files or stim/physio-like .tsv.gz files. As it is, they're not either, which means we now have a third way of specifying column headers, but so be it. I do not want to relitigate a merged BEP.

Remi-Gau commented 5 months ago

Fair. Should we update common principles though?

effigies commented 5 months ago

Possibly. I don't know what I'd want them to say at this point.

sappelhoff commented 5 months ago

in retrospect I wish we'd pushed a bit to make them either standard TSV files or stim/physio-like .tsv.gz files. As it is, they're not either, which means we now have a third way of specifying column headers, but so be it. I do not want to relitigate a merged BEP.

I seem to recall that this was being discussed at some point. I just fail to find the thread (e.g., in https://github.com/bids-standard/bids-examples/pull/425). @sjeung @JuliusWelzel do you remember discussions about this and could potentially link to them?

I don't know what I'd want them to say at this point.

Not pretty, but could do this:

https://github.com/bids-standard/bids-specification/blob/365f3219aec261654220c4bb88e02498b5b39ad8/src/common-principles.md?plain=1#L431-L433

Each TSV file MUST start with a header line listing the names of all columns
(with the exception of physiological and other continuous recordings
+, and motion.tsv files). 
JuliusWelzel commented 5 months ago

Hi, thanks for picking this up @effigies. I found a protocol from March 21, which said we would use headers in the motion.tsv file. For me, having headers in the motion.tsv file makes sense for a future release.

effigies commented 5 months ago

IMO we should either migrate to headers right now or not at all in BIDS 1.x. There hasn't been much data uploaded at this point and the text of the spec is ambiguous, so we can clarify in the opposite direction as well.

Remi-Gau commented 5 months ago

Slight preference for having headers for internal consistency but I may be missing the kind of headaches it may involve for motion data

sjeung commented 5 months ago

I think the protocol from 2021 was before we agreed on separating channel files for multiple tracking systems. We dropped headers after it became obsolete with the introduction of tracksys entity in names of channels.tsv files. Having no headers is consistent with BIDS validator and example data set as well.

Unfortunately I could not find any log of .gz discussion which I recall was around community review phase, and the main reason against .gz was time pressure before release of the upcoming BIDS version back then. So I would be happy to make motion spec coherent with other continuous time series data .tsv files in BIDS like stim or physio in the future. (no headers, and tsv.gz)

effigies commented 5 months ago

Ah, yes. I read that as "March 21, 2023", not "March 2021". If that is that out-of-date, then I do not think that qualifies as a decision that was missed during merge, but an outdated decision.

I am +1 for allowing motion to be .tsv or .tsv.gz, but there was just a suggestion in #197 to permit Parquet files, which I think would be a better fit than either. If that goes places, I would say we don't add .tsv.gz, but simply add .parquet.