[ENH] fMRI: proposed new metadata tag NumberOfVolumesIgnoredByUser

bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification

https://bids-specification.readthedocs.io/

Creative Commons Attribution 4.0 International

271 stars 156 forks source link

[ENH] fMRI: proposed new metadata tag NumberOfVolumesIgnoredByUser #195

Closed jdkent closed 5 years ago

jdkent commented 5 years ago

References poldracklab/fmriprep#1559 and at least one neurostars conversation.

The Problem

FMRIPREP automatically detects the number of non-steady state volumes per bold run, but this number can be variable across bold runs and subjects, introducing variability that some experimenters would like the option to avoid (e.g. have the same number of volumes ignored across all runs and subjects)

Proposed Solution

Similar to the metadata tag NumberOfVolumesDiscardedByUser, NumberOfVolumesIgnoredByUser would represent the number of volumes to be ignored by analytical tools, but would still be included in the dataset.

The potential downside would be the fact that different users would want to ignore a different number of volumes per analysis and would have to change a file in the dataset.

chrisgorgo commented 5 years ago

This feels more like an analysis parameter rather than a description of raw data. As you mentioned different users might prefer to set it to different values.

FMRIPREP users that want to have the same number of non steady state volumes across the whole dataset can look at the estimates for all participants/runs and take the maximum.

jdkent commented 5 years ago

I'm feeling a bit flip-floppy on this now, I think the perspective @effigies would take is perhaps this parameter is decided by the MRI technician/researcher beforehand and is independent of analysis, but now I'm starting to think this is more of an opinionated parameter without a solid ground truth. Did I represent your perspective @effigies? Or what else am I missing?

effigies commented 5 years ago

Sorry about the delay.

My assumption is that a protocol will generally have a known set of dummy scans, and that this is a reasonable thing to include in the metadata. BIDS seems to promote the practice where dummy scans are discarded either by the scanner (NumberOfVolumesDiscardedByScanner) or during the curation process (NumberOfVolumesDiscardedByUser), before making it to any preprocessing. Meanwhile fMRIPrep actually uses these for constructing references, so I'd rather not discourage curators from keeping them in.

When fMRIPrep detects different numbers of non-steady-state volumes, derivatives are affected. For example, CompCor and AROMA are performed with dummy scans removed, so failure to detect some may result in a differential quality across runs. Occasionally the number of cosine basis regressors that are used for CompCor and get saved for downstream processing might increase/decrease by one based on whether a scan was included or not.

So it's not a post-preprocessing concern. It's a question of whether it's an attribute of the data or a preprocessing choice. I'm inclined to see it the former way.

jdkent commented 5 years ago

I asked our resident MRI physicist (@vmagnotta) and got the following response:

The number of dummy scans should be selected based on reaching a steady state of the magnetization. This will be based on TR and flip angle. Thus it will vary based on imaging protocol.

And furthermore he stated about the variation of dummy scans between participants:

...this should not vary by more than 1 across a study since the T1 of GM/WM is not likely to vary drastically across subjects.

With this I'm more comfortable with saying the number of dummy scans is an attribute of the data (based on TR and flip angle), as opposed to a number a particular researcher may like for whatever reason.

My follow-up concern was how much (dis)agreement there is across physicists for determining steady state, just because they may use the same terms in their equations, they may have different interpretations for what thresholds should be used.

Vince replied:

No debate at all. The bloch equations can be used to simulate everything in MRI.

more evidence to say the number of "dummy scans" is an attribute of the data.

However, I would like a second opinion just to be safe. @neurolabusc: would you be willing to offer your stance on how the number of "dummy scans" should be calculated, and if you are aware of any (common) scenarios where MRI physicists would come to different conclusions on the same fmri bold dataset?

neurolabusc commented 5 years ago

For a pragmatic solution, one could estimate the T1-effects and determine the number of volumes to discard such that the residual T1-effects will have a longer frequency than the high-pass filter that is applied (or perhaps twice as long to account for roll off). As long as one discards volumes acquired prior to this point, the T1-effects have no influence on the data.

For Siemens scanners you can determine the number of NumberOfVolumesDiscardedByScanner from the TR. Specifically: TR > = 3001ms: 1 dummy scan, 3001 < TR <= 1501ms: 2 dummy scans, 1501 < TR <= 1001ms: 3 dummy scans, etc. To the best of my knowledge, the number of dummy scans is not recorded in the DICOM header.

effigies commented 5 years ago

This calculation seems like a reasonable one to add to dcm2niix, if you can be sure that it's true of all BOLD sequences, or can confidently identify the sequence.

And if it is true of all BOLD sequences, and we can be confident that the NumberOfVolumesDiscardedByScanner and NumberOfVolumesDiscardedByUser will be present and accurate when non-zero, we could calculate:

expected_dummies = ceil(3001.0 / int(RepetitionTime * 1000))
remaining_dummies = max(0,
                        expected_dummies
                        - NumberOfVolumesDiscardedByScanner
                        - NumberOfVolumesDiscardedByUser)

But I'm not sure we can count on either of these conditions. I think it's much more likely that the researcher knows that a certain number of dummy scans were planned to be collected prior to the start of the task.

Also want to note that this addresses poldracklab/fmriprep#1136. @alexlicohen Do you have thoughts on whether you consider this a dataset attribute or an processing choice?

jdkent commented 5 years ago

I spoke with Vince in person and he said in principle if we knew the: 1) flip angle 2) TR 3) magnet strength 4) T1 values for gray and white matter 5) how many volumes were discarded by the scanner (and/or user)

it would be possible to make an algorithm that could calculate the number of dummy scans, but his recommendation to "keep it simple", would be to have it as a processing choice. (the algorithm potentially appears to be non-trivial)

But I'm not sure we can count on either of these conditions. I think it's much more likely that the researcher knows that a certain number of dummy scans were planned to be collected prior to the start of the task.

I think this the more likely scenario as well, which suggests to me that different researchers may have different notions on what the "right" number of dummy scans should be.

jdkent commented 5 years ago

Once the above pull request is merged, I'll close this issue, so if there are any more opinions, make it known soon!