Closed chrisgorgo closed 2 years ago
@cmaumet Thanks for the aggregation. I think for usability, we should probably try to identify when identical models are available under different models and choose a common name.
To summarize here:
'spm'
, 'glover'
, or 'fir’
‘Double-Gamma HRF’
, ‘Gaussian HRF’
, ‘Gamma HRF’
, ‘Custom basis set’
, ‘Gamma basis set’
, ‘Sine basis set’
, ‘FIR basis set’
‘Canonical HRF’
‘BLOCK’
, ‘GAM’
, ‘TWOGAM’
, ‘SPMG1’
, ‘WAV’
, ‘MION’
I have a suspicion that we can make some of the following equivalences, but these would all need to be verified:
BIDS name | SPM | FSL | AFNI | nistats |
---|---|---|---|---|
gamma | Gamma HRF | GAM | ||
doublegamma | Canonical HRF | Double-Gamma HRF | TWOGAM | spm, glover |
fourier | Fourier Set | Sine basis set | WAV | |
gammabasis | Gamma Functions | Gamma basis set | ||
fir | Finite Impulse Response | FIR basis set | fir |
(Unlisted: SPM.Fourier Set (Hanning)
, SPM.None, FSL.Custom basis set
, AFNI.BLOCK, AFNI.SPMG1, AFNI.MION)
SPM's Gaussian HRF will not be supported, but any unsupported HRFs can go into the software-specific arguments.
Hey @cmaumet & @effigies,
not sure, if I'm off here, but SPM also has the following: FIR, Fourier Set, Fourier Set (Hanning), Gamma functions. I also added a comment in the respective section of the google docs.
This misses out several SPM options.
Fourier Set Fourer Set (Hanning) Gamma Functions Finite Impulse Response
None
Gamma functions are a set of 3 bases, fast, medium and slow (scaled versions of the 'single gamma').
I think I've seen all of these used at some point, and so deserve consideration (especially if we're going to keep 'Gaussian HRF' which is evil and never used AFAIK).
BTW, None is insane, but offered in FSL and AFNI too.
Updated the SPM entries in the table with my best guesses (I assume a fourier set is a sine basis), but reiterate the need to confirm, ideally by generating toy design matrices in each environment.
(especially if we're going to keep 'Gaussian HRF' which is evil and never used AFAIK).
If it's unused to the point that it's not needed for the sake of representing historical models (i.e., to reproduce a published model run on an open dataset), I'm okay with dropping it.
BTW, None is insane, but offered in FSL and AFNI too.
I assume this would be equivalent to "Densify" in the current set, so wouldn't need special treatment by "ConvolveHRF". Added it to the list of things missing from the table, for reference, however.
Table looks good to me. I suggest that we only include HRFs that are supported by at least 2 major packages in the controlled vocabulary. If you want to use something that only one package supports, you can specify it in the software
section, and that way at least it's clear that nothing else will be able to fit the model.
If the table is the list of convolutions we can do ourselves, then it shouldn't really matter how many packages will support them directly, as we'll be passing an already prepared design matrix. The point is more to say "If you're used to doing this, use this name to get the same behavior." and actually to widen the scope of HRFs available to users of any given package.
So in that sense, removing Gaussian would simply force people to pass it manually to FSL, which is the only place they should have been able to do it before. Which should help serve as a deterrent for future use while not disabling the replication of past models.
WDYT?
Oh, also, for the sake of actually being a standard, I think a mathematical, reference, and/or pseudocode implementation of each kernel we support through ConvolveHRF
would be a good idea. That way, biRds (or whatever the R package will be) can verify its implementation.
I agree that supporting all HRFs in any package would be optimal, but the cost to that is that somebody has to then implement it all (in every supported language). I guess we can go ahead anyway, and the implementing package can simply complain that it doesn't understand the named HRF if it can't support it.
Re: a symbolic or pseudocode implementation, that would be nice, but seems like a lot of work. I'm not sure who would have time to take this on...
I guess we can go ahead anyway, and the implementing package can simply complain that it doesn't understand the named HRF if it can't support it.
This would be my suggestion. It's also an opportunity for people who want a specific HRF to help implement it in the package they want to use. Maybe we should say "conforming applications SHOULD advertise which HRF kernels they support", but if they do support them, they should use these names.
Re: a symbolic or pseudocode implementation, that would be nice, but seems like a lot of work. I'm not sure who would have time to take this on...
For symbolic approaches, there should be a reference paper with the relevant parameters. It is a fair point that it's a lot of work, so maybe this would be another good thing to leave as a valuable potential contribution; I would want solid review of any reference implementations that will go directly into a spec.
@afni-gangc Would you mind reviewing the AFNI column of https://github.com/bids-standard/stats-models/issues/56?
Has "micro time resolution" been specified as an option somewhere? It differs between FSL (dt=0.05s) and SPM (dt=TR/16) and I'm not sure what it is in AFNI. (This is the increased time resolution in which the events - or blocks - are convolved with the HRF, improving the convolution that would otherwise be done at TR resolution.) If not accounted for we will not reproduce the softwares' design matricies.
@nicholst, sure, we could add that as an option. Okay if we call it oversampling
, and handle it as in SPM (i.e., the denominator in TR/oversampling
)?
@tyarkoni That won't handle the constant dt well. It would force users to calculate 0.05s/TR
if they want to match historical FSL usage. Which is maybe not a problem, if we want to push users towards the SPM approach.
Well, I think we want to have a uniform approach... I don't want to support both FSL and SPM-style arguments. I don't like the FSL approach mainly because specifying the smallest unit of temporal sampling seems to me like either an implementation issue, or something that the user should specify globally somewhere else in the model (since this same issue comes up for any kind of dense transformation, not just HRF convolution).
Either of those options seems reasonable to me. I agree that supporting both approaches is ugly; I just have no sense of how much choosing one over the other would lead to half the community doing manual division and having weird magic numbers in all of their models.
Beyond a certain temporal resolution, I don't think it matters. So whether as oversampling
or an absolute convolution_dt
, you'll get similar results I think. So I'd pick one that you find convenient (I though the absolute FSL dt was more elegant, but, again, I don't really think it matters which you choose).
Some comments about AFNI's builtin HRF models:
TWOGAM could be made equal to other programs' models, with the proper specification of parameters for the gamma variates.
WAV is not a Fourier basis set, and the WAV model is only present for historical reasons now. The closest direct analogy to a Fourier expansion for the HRF in AFNI is denoted by 'SIN', which is a sine series (duh). An analogous model is ' POLY' = Legendre polynomial series -- probably never used in history.
FIR modeling is done in AFNI using one of the 'TENT' or 'CSPLIN' models (TENT = linear splines, CSPLIN = ... wait for it ... cubic splines).
Changing the names of models, or adding new models, is relatively easy in AFNI -- requires a few changes to the source code, but those are pretty straightforward. I'm on board with standardization as long as my suffering is kept at a reasonable limit.
In AFNI, the built-in 'micro time' resolution is 0.01s. There is no option to change this, but I suppose that could be added - although I'm not sure why.
Given that 2 of 3 packages use an absolute sampling, should we go for that? Using the lower of the two, AFNI's dt=0.01s seems like a pretty safe option.
Also, if it wasn't explicit, this upsampling should be on by default as part of HRF convolution... not sure why you'd ever not want to not use it. (Yes, someone someday might have TR=0.01s, but (a) the HRF is very smooth at that resolution and, (b) the users can always change the upsampling resolution to, e.g. dt=0.001, via some option).
That's fine with me. But I don't think this should be a property of the HRF specification anyway; it should be a general property of the whole analysis pipeline. Since many transformations have to convert between sparse and dense formats, and potentially resample different dense variables to the same temporal resolution, I think optionally setting a dt
property at the level of the entire analysis (or at least, each step) is probably the way to go. A default of 100 Hz seems reasonable to me.
Re: why you might not want to use it, the main reason is computational efficiency. It's probably not worth worrying about in the context of model estimation procedures that take hours, but when you have hundreds of subjects and dozens of variables, upsampling to >= 100 Hz can really start to slow down the construction of the design matrix. But I agree that this should be a secondary consideration, and users can explicitly sacrifice smoothness if they really need speed.
Actually, while we're on the topic of this tradeoff, @adelavega, one thing we should consider for the NeuroScout UI is using a much lower temporal resolution (e.g., 1 Hz) for the initial sanity check/collinearity diagnostics we show users on demand, and then only compiling with high temporal resolution once they've finalized the model.
Oh, wait, that probably won't work, because events are already stored at a fixed sampling rate in the DB. Never mind.
@effigies: I like your table it gives a good high-level view of available HRFs but I don't think we can claim any of those are strictly equivalent. E.g. the doublegamma
in SPM canonical
is different from the doublegamma
in FSL
. Is the idea to let the neuroimaging software pick the closest (available) equivalent?
I would be (very) happy with that even though that's not what I had understood from previous discussions.
Ah, no. If they're different, we should give them different names, IMO, otherwise the specification ceases to be software independent. Unless we want to allow parameters, which could allow doublegamma
to become equivalent to SPM's or FSL's, and specify the parameters for each.
I think that we should be honest and clarify that the current approach to BIDS models is not so much about being ‘software independent’ but (from the point of view of existing analysis software) outsourcing design matrix creation to a 3rd party tool.
This is important to be clear about this because once design matrix creation is outsourced like this, a subset of modelling software functionality will be lost. Specially event-related plotting of stimulus response in SPM and FSL will be unavailable, as will first level effect size extraction (eg with FSL’s featquery or the popular SPM extension MarsBar).
I think such a reduction is methodological plurality could be a good thing, but we should be explicit about it and account for it when we are anticipating uptake of the framework.
@nicholst, account for it where/how? Is there specific wording you'd like to include in the spec?
I guess I dislike the "outsourcing" framing because it implies that there's something privileged about the existing packages that do everything. I prefer to think of it in terms of modularity: having a software-independent standard means that the bar is much lower to building tools that can handle just one part of the typical analysis pipeline, rather than having to do everything. So it isn't about having one 3rd party tool do design matrix creation; it's about creating an entire ecosystem of interoperable tools, where some do data ingestion, some do design matrix construction, some do estimation, and so on.
I'm not sure why widespread adoption should result in tools like featquery or MarsBar no longer being used. Presumably if people use FSL to fit first-level models, they can still use featquery to interrogate the first-level results, no? If you mean that the spec itself doesn't have any provision for plots, that's true, but (a) it could be extended at some point, and (b) package developers are welcome to construct their own mini-language within the spec that just gets passed through in the software
section (i.e., FSL just needs to come up with its own vocabulary for encoding featquery calls, and then users can include that in the JSON).
The issue is about where the metadata for the events live. Event-time locked plots ("peristimulus time histogram" in SPM, and some other version of this in FSL's HTML) require it & won't be available when a design matrix is loaded directly in.
This won't effect the final statistic maps (if we succeed in exactly creating each s/w's convolution) but impacts the usability of each package when working with BIDS model input.
The alternative view is that BIDS Model is a model-buliding recipe that any software can use implement an analysis.
Closing this as it seems to have been addressed by #36
I've done a first pass at this in the spec document: https://docs.google.com/document/d/1bq5eNDHTb6Nkx3WUiOBgKvLNnaa5OMcGtD0AZ9yms2M/edit#
@effigies: is that what you were after?