stats.threshold usage is confusing

ejolly commented 5 years ago

Thresholding method on Brain_Data objects vs the threshold function behave differently. @ljchang is this intended behavior? For example, a common workflow is to:

Compute a stat map and an associate pmap (e.g. via permutation testing)
Determine a threshold via mc correction (e.g. fdr)
Mask the stat map using the the pmap that < threshold

However, threshold changes the shape of Brain_Data rather than simply masking out non-surviving voxels like the Brain_Data.threshold method. This make it impossible to append images to each other for storage or other use.

ljchang commented 5 years ago

We should talk about this and simplify and consolidate. Just took a peak at this and one thing I'm concerned about with Brain_Data.threshold is the coerce_nan flag is set to True by default.

            b.data = np.nan_to_num(b.data)

My guess is that this was added to deal with nans and to keep the shape the same, but this will be an issue depending on what the values of the map you are trying to threshold. This option assumes that values will be centered at zero, which will not always be the case.

stats.threshold looks like it uses apply_mask, which for some reason by default casts the mask into a nibabel instance and then uses nifti_masker to go back into the same space as the data to be masked. My guess is that this is to deal with use cases where mask is of a different shape than the data, which is fairly common. We might want to do a quick check if they are different shapes and only do this if they are different shapes, otherwise we will want to just do a straight boolean mask, which will be much faster.

@ejolly I'm not quite sure why it would be changing the shape of Brain_Data though. Any ideas?

Actually, this might be a bug where nifti_masker = NiftiMasker(mask_img=mask) should be nifti_masker = self.NiftiMasker(mask_img=mask). Actually, this whole code after this could be simplified as it should just be projecting mask into self space. @ejolly let me know if you agree and we can fix.

This is related to #310.

ejolly commented 5 years ago

@ljchang I think it's changing the shape for exactly the reason you suggested. The threshold function makes a call to Brain_Data.apply_mask which always changes the shape of the output (as it's intended to). However, the Brain_Data.threshold method operates on the Brain_Data.data numpy array directly, always returning back the same shape.

I guess the broader decision is, that there are two different kinds of "masking" operations a user wants to perform and we don't cleanly differentiate between the two:

1) Set all values in an existing Brain_Data object to 0 unless they're inside of a mask. This is analogous to adjusting a threshold in fsleyes or xjview for spm. It's currently what Brain_Data.threshold does, and I believe what a user wants if they make a call to the threshold function directly (e.g. masking out t-stats based on p-values and a cutoff) 2) Subset a Brain_Data object so you only get back the voxels inside of a provided mask, thereby changing the shape. This is currently what Brain_Data.apply_mask does and is super useful and a bit unique to our tools because it makes working with ROI data directly very convenient

ejolly commented 5 years ago

One possible delineation is this:

For masking operations in the style of 1. above, we make the Brain_Data.threshold a bit more flexible. If a user provides a file path or another Brain_Data instance (e.g. t_brain.threshold(p_brain, thresh=.05) just makes a call to the threshold function (like our stats methods) which performs the logic of creating a boolean mask and always returning back something of the same shape. If a user doesn't provide a file path or another Brain_Data instance, (e.g. they want percentile or value-based thresholding), it does what it currently does and just sets all values that are outside the threshold to 0, still returning something of the same shape
For subsetting operations in the style of 2. above, we can use my suggestion in https://github.com/cosanlab/nltools/issues/309 and just perform straight numpy boolean indexing if the mask is of the same shape as the Brain_Data instance, otherwise proceed with the NiftiMasker logic in the existing method code, thereby always changing the shape of the return object to be the same size as that of the mask.

How does something like that sound?

ejolly commented 1 year ago

Ugh getting burned by this again. We should really standardize behavior across stats.threshold and Brain_Data.threshold. Here's an example function that doesn't change the shape of the output and behaves likeBrain_Data.threshold:

def thr_brain_keep_size(stat, p, thr=0.05):
    """Just like threshold from nltools.stats but keeps the data shape"""
    from copy import deepcopy

    out = deepcopy(stat)
    out.data[p.data > thr] = 0
    return out

ejolly commented 1 year ago

@ljchang Few issues with students at MIND encountering this again. I'd strongly advise to replace nltools.stats.threshold with the the function above.

Is there a situation I'm missing where we want Brain_Data.shape to change after using threshold(Brain_Data)?

cosanlab / nltools

stats.threshold usage is confusing #304