Open ejolly opened 5 years ago
We should talk about this and simplify and consolidate. Just took a peak at this and one thing I'm concerned about with Brain_Data.threshold
is the coerce_nan
flag is set to True
by default.
b.data = np.nan_to_num(b.data)
My guess is that this was added to deal with nans and to keep the shape the same, but this will be an issue depending on what the values of the map you are trying to threshold. This option assumes that values will be centered at zero, which will not always be the case.
stats.threshold
looks like it uses apply_mask
, which for some reason by default casts the mask into a nibabel instance and then uses nifti_masker to go back into the same space as the data to be masked. My guess is that this is to deal with use cases where mask is of a different shape than the data, which is fairly common. We might want to do a quick check if they are different shapes and only do this if they are different shapes, otherwise we will want to just do a straight boolean mask, which will be much faster.
@ejolly I'm not quite sure why it would be changing the shape of Brain_Data though. Any ideas?
Actually, this might be a bug where nifti_masker = NiftiMasker(mask_img=mask)
should be nifti_masker = self.NiftiMasker(mask_img=mask)
. Actually, this whole code after this could be simplified as it should just be projecting mask into self space. @ejolly let me know if you agree and we can fix.
This is related to #310.
@ljchang I think it's changing the shape for exactly the reason you suggested. The threshold
function makes a call to Brain_Data.apply_mask
which always changes the shape of the output (as it's intended to). However, the Brain_Data.threshold
method operates on the Brain_Data.data
numpy array directly, always returning back the same shape.
I guess the broader decision is, that there are two different kinds of "masking" operations a user wants to perform and we don't cleanly differentiate between the two:
1) Set all values in an existing Brain_Data
object to 0 unless they're inside of a mask. This is analogous to adjusting a threshold in fsleyes or xjview for spm. It's currently what Brain_Data.threshold
does, and I believe what a user wants if they make a call to the threshold
function directly (e.g. masking out t-stats based on p-values and a cutoff)
2) Subset a Brain_Data
object so you only get back the voxels inside of a provided mask, thereby changing the shape. This is currently what Brain_Data.apply_mask
does and is super useful and a bit unique to our tools because it makes working with ROI data directly very convenient
One possible delineation is this:
Brain_Data.threshold
a bit more flexible. If a user provides a file path or another Brain_Data
instance (e.g. t_brain.threshold(p_brain, thresh=.05)
just makes a call to the threshold
function (like our stats methods) which performs the logic of creating a boolean mask and always returning back something of the same shape. If a user doesn't provide a file path or another Brain_Data
instance, (e.g. they want percentile or value-based thresholding), it does what it currently does and just sets all values that are outside the threshold to 0, still returning something of the same shapeBrain_Data
instance, otherwise proceed with the NiftiMasker
logic in the existing method code, thereby always changing the shape of the return object to be the same size as that of the mask. How does something like that sound?
Ugh getting burned by this again. We should really standardize behavior across stats.threshold
and Brain_Data.threshold
. Here's an example function that doesn't change the shape of the output and behaves likeBrain_Data.threshold
:
def thr_brain_keep_size(stat, p, thr=0.05):
"""Just like threshold from nltools.stats but keeps the data shape"""
from copy import deepcopy
out = deepcopy(stat)
out.data[p.data > thr] = 0
return out
@ljchang Few issues with students at MIND encountering this again. I'd strongly advise to replace nltools.stats.threshold
with the the function above.
Is there a situation I'm missing where we want Brain_Data.shape
to change after using threshold(Brain_Data)
?
Thresholding method on
Brain_Data
objects vs thethreshold
function behave differently. @ljchang is this intended behavior? For example, a common workflow is to:However,
threshold
changes the shape ofBrain_Data
rather than simply masking out non-surviving voxels like theBrain_Data.threshold
method. This make it impossible to append images to each other for storage or other use.