bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
265 stars 154 forks source link

[ENH] Add compressed TSV files to the common principles #1749

Closed oesteban closed 3 months ago

oesteban commented 3 months ago

Their description is hedged within the physiological recordings so upcasting them to the common principles seems important.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.93%. Comparing base (bd08602) to head (cab6e9d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1749 +/- ## ======================================= Coverage 87.93% 87.93% ======================================= Files 16 16 Lines 1351 1351 ======================================= Hits 1188 1188 Misses 163 163 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Remi-Gau commented 3 months ago

Slightly related to #644

Remi-Gau commented 3 months ago

I think that this reverts some of #1699, no? Specifically the exception to the rule for tsv for motion data as motion.tsv are header less.

oesteban commented 3 months ago

It feels lines 46-48 of the diff should stay. The first parenthesis is addressed, but can be made more explicit either here or (better, IMHO) within motion.

I can take care of harmonizing this.

On Mon, Mar 25, 2024, 13:05 Remi Gau @.***> wrote:

I think that this reverts some of #1699 https://github.com/bids-standard/bids-specification/pull/1699, no? Specifically the exception to the rule for tsv for motion data as motion.tsv are header less.

— Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-specification/pull/1749#issuecomment-2017854619, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESDRRFSSQD2DHQ4JZKKC3Y2AHIBAVCNFSM6AAAAABFGOEVG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJXHA2TINRRHE . You are receiving this because you authored the thread.Message ID: @.***>

oesteban commented 3 months ago

I think that this reverts some of #1699, no? Specifically the exception to the rule for tsv for motion data as motion.tsv are header less.

Okay, I'm realizing a problem -- motion's tsv files ARE NOT compressed, is that right?

I'll push a commit improving some of my changes not to undo the motion exception.

Remi-Gau commented 3 months ago

I think that this reverts some of #1699, no? Specifically the exception to the rule for tsv for motion data as motion.tsv are header less.

Okay, I'm realizing a problem -- motion's tsv files ARE NOT compressed, is that right?

I'll push a commit improving some of my changes not to undo the motion exception.

Yup. Sorry. Pre coffee me was not clear. This is an "inconsistency" we realized only recently .

oesteban commented 3 months ago

Okay, compatibilization with #1699 is done (https://github.com/bids-standard/bids-specification/pull/1749/commits/c4ee8d68cdff253055f86ed82c3992de8562bd35).

Slightly related to #644

I doubled down with https://github.com/bids-standard/bids-specification/pull/1749/commits/4eadad536ca962c9a113b4f4aeaa6f195938f20f. I am happy to see that changed if #644 leads the table somewhere else later on.

oesteban commented 3 months ago

the html looks good and the pdf good enough

I just simplified the admonitions' titles so the pdf should be even less far from the HTML now.

effigies commented 3 months ago
By the regulations, we should give this until Monday (Apr 1) for community review. ![regulations](https://github.com/bids-standard/bids-specification/assets/83442/50b818a4-e39a-48ba-849c-e6b1a0aa7c60)
effigies commented 3 months ago

I did not read this as intending to make .tsv.gz available for all, but to bring its definition into common principles.

oesteban commented 3 months ago

No, the intent is not to allow TSVGZ beyond what it is used for already. I'll try to think how to make it more clear.

On Tue, Mar 26, 2024, 17:45 Stefan Appelhoff @.***> wrote:

@.**** commented on this pull request.

Thanks for the proposal!

Do I understand it correctly that the current proposal makes gzipped TSV files valid for all TSV files (not just physio and stim, as it was before)?

If yes, wouldn't that also need some additions to the BIDS schema?

I read the PR and left two comments. In general, I am in favor, but I feel like this is a big change: With this, users may supply any TSV file they want as TSVGZ. This might break a lot of tools initially. Perhaps we need to clarify this bit:

Tabular data containing large numbers of rows MAY be saved as compressed tabular files (with extension .tsv.gz) <#m_1909724630272335821_compressed-tabular-files> as prescribed below.

What's a "large number of rows"?

Somewhat related:


In src/common-principles.md https://github.com/bids-standard/bids-specification/pull/1749#discussion_r1539700504 :

     }

} }

+### Compressed tabular files

I would make this a sub-heading under tabular files. ⬇️ Suggested change

-### Compressed tabular files +#### Compressed tabular files


In src/modality-specific-files/physiological-and-other-continuous-recordings.md https://github.com/bids-standard/bids-specification/pull/1749#discussion_r1539705833 :

@@ -38,8 +35,12 @@ before the suffix. For example for the file sub-control01_task-nback_run-1_bold.nii.gz, <matches> would correspond to sub-control01_task-nback_run-1.

-Note that when supplying a *_<physio|stim>.tsv.gz file, an accompanying -*_<physio|stim>.json MUST be supplied as well. +!!! note "TSVGZ headers are specified in metadata files." +

  • TSVGZ files MUST NOT include a header line,
  • as established by the common-principles).

⬇️ Suggested change

— Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-specification/pull/1749#pullrequestreview-1961085811, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAESDRUKDRQPSHIWHVBZMK3Y2GJY3AVCNFSM6AAAAABFGOEVG6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNRRGA4DKOBRGE . You are receiving this because you authored the thread.Message ID: @.***>

sappelhoff commented 3 months ago

I'll try to think how to make it more clear.

Thanks, I'll mark the parts that I find a bit ambiguous.

oesteban commented 3 months ago

@effigies @Remi-Gau @sappelhoff please check the changes to make unambiguous that .tsv and .tsv.gz are not interchangeable https://github.com/bids-standard/bids-specification/pull/1749/commits/48ffe9d1a920b4fe9bde3be69063b703e8ebd3ed

It feels too verbose to me, but I agree with @sappelhoff we need to avert any diverting readings of the specs. TBH, understanding that tsv.gz can be used instead of tsv at every instance felt a bit forced to me, but it's better to make the interpretation absolutely unambiguous.

oesteban commented 3 months ago

Considering that the minimum period for discussion has been safely satisfied, the very incremental nature of the proposal, the absence of major side effects on the specs, and the possibility of ironing around yet-to-be-identified wrinkles through the PR mechanism anytime in the future, I think I can go ahead and merge.

There are a few aspects to follow up from here: