FWHM Calculation with Zero Variance Nodes

DCHartlen commented 2 years ago

I've encountered an issue when analyzing time-series signals that start from a common point, such as the origin (0, 0). In short, despite the sets of signals being very different, statistical inferences using SMP1D indicate there was not a significant difference. I narrowed the issue down to the calculation of field roughness and FWHM in +geom/fwhm.m. When the variance at a node was zero, FWHM was unrealistically small (indicating a very rough field).

I believe the issue is how variance/standard deviation is being handled for zero variance nodes. In fwhm.m, the code is:

% normalize:
v       = v ./ (ssq + eps);
% ignore zero-variance nodes:
i       = isnan(v);
v       = v(~i);

This code produces a value for v that is many, many orders of magnitude larger than the rest of the signal when variance is zero. Effectively, it produces an infinity, although the addition of eps forces the infinity to have a defined value. This one entry in v dramatically skews my final FWHM computation and raises the critical threshold when performing a statistical inference at a given alpha.

I'm curious if this is a bug or if my interpretation of SPM1D and RFT is incorrect. The code has a comment indicating that zero-variance nodes should be ignored, but i = isnan(v); is not having an effect, as v = v ./ (ssq + eps); forces definite values. Effectively, this line of code isn't working to identify and ignore zero-variance nodes.

I added a workaround while debugging things to replace zero-variance handling with the following bit of code that explicitly removes any nodes with zero variance prior to computing v. This snippet would replace the code handling zero-variance in fwhm.m.

ssq = ssq(ssq > eps);
v = v(ssq > eps);
v  = v ./ (ssq);

I've found this change improves how SPM1D interprets the signals I'm working on, at least from a visual interpretation standpoint. However, I'm still left with the question as to if this is a bug or if I'm incorrectly interpreting things on my account. It also makes sense to me that v would be extremely large in the event of zero-variance nodes conceptually, but the resulting average FWHM/field roughness doesn't make sense physically given the context of the signal.

I would really appreciate your insight on this problem. Thanks!

0todd0000 commented 2 years ago

Thank you for reporting this issue. Your interpretation sounds correct, and there is indeed a problem somewhere. spm1d should return an error or at least a warning if a zero-variance node is found, so something has gone wrong somewhere.

That ignore zero-variance nodes code snippet is implemented only for masking purposes, where nodes outside of user-specified regions of interest (ROIs) are given nan values. If you are not using ROIs, then an error or warning should be generated at the data-checking stage, well before reaching the FWHM estimation stage.

To check whether or not this is a bug, please do the following:

Please download a clean version (the most recent version) of spm1d.
Please create a minimal script that reproduces the problem.
Please attach the script and any outputs (figures, warnings, errors, etc.) to demonstrate the problem.

Please ensure that you use a clean copy of spm1d, with no source code modifications, otherwise it will not be possible for me to track the problem's source.

DCHartlen commented 2 years ago

Hi Todd,

I'm not getting errors or warnings thrown explicitly by spm1d. However, when I run a two-sample Hotelling T² I do receive the following warning when spm1d processes a zero-variance node.

Warning: Matrix is singular to working precision. 
In spm1d.stats.hotellings2>here_hotellings2_single_node (line 49)
In spm1d.stats.hotellings2 (line 24)

I do not see this warning when I run a two-sample t-test on signals with zero-variance nodes. However, zero-variance nodes are still being passed to fwhm.m when running a t-test inference that are raising the field threshold.

DCHartlen commented 2 years ago

Hi,

Sorry I didn't attach a script or images before. I'm afraid I couldn't share the data I'm working on. But I have worked up a simple test script to demonstrate the error. The attached script uses a synthetic dataset with explicit zero padding and uses the two-sample t-test, rather than the Hotelling T² I mentioned in my previous post. However, I've experienced the zero variance issue with both tests.

No errors or warnings are thrown in the presence of multiple zero-variance nodes. However, monitoring the inputs to the fwhm.m shows that zero-variance nodes are still passed into the function to assess global FWHM. In the case of this script, FWHM drops by about 3 orders of magnitude if zero-variance nodes are included in FWHM calculations.

I hope this helps. ZeroVarianceTestScript.zip

0todd0000 commented 2 years ago

OK, thank you very much for the script and the explanations. I have added a zero-variance checker to the source code, and it should now return an error that looks something like this:

--- spm1d ZERO VARIANCE ERROR ---
Zero variance detected at the following nodes:
    50  70
Remove all zero-variance nodes before running tests.

Please update your spm1d code to the new version:

spm1d version M.0.4.9 (2022.03.24)

This should solve the problem for all procedures in spm1d.stats, where "solve" means only that the error message will alert users to zero variance in the data.

DCHartlen commented 2 years ago

Thanks for the update. I can confirm the new update throws an error when the input signals include zero variance nodes.

Would it be possible to have spm1d handle zero variance nodes internally? Maybe the same error checking function could remove or otherwise mask zero variance nodes and continue SPM calculations while throwing a warning instead of an error? I understand that this would take a fair amount more work to implement.

DCHartlen commented 2 years ago

Hi Todd,

Thanks for the explanation, it helped clear some things up for me. However, I'm not able to run the example you posted in the updated version of smp1d. Even with the ROI, the new check_zero_var function throws a zero variance error.

Looking at +stats/ttest2.m, I'm pretty sure check_zero_var is being passed the all the nodes, not just the ones in the ROI. The current error checking code is

[yA,yB]       = deal(spm1d.util.flatten(yA), spm1d.util.flatten(yB));
spm1d.util.check_zero_var(yA(:,roi));
spm1d.util.check_zero_var(yB(:,roi));

Incorporating the ROI into this check would be pretty straightforward

parser        = inputParser;
addOptional(parser, 'roi', [],  @(x)isempty(x) || ((islogical(x)|| isnumeric(x)) && isvector(x))   );
parser.parse(varargin{:});
roi           = parser.Results.roi;

[yA,yB]       = deal(spm1d.util.flatten(yA), spm1d.util.flatten(yB));
spm1d.util.check_zero_var(yA(:,roi));
spm1d.util.check_zero_var(yB(:,roi));

It looks like there are similar issues in the other statistical tests.

0todd0000 commented 2 years ago

Apologies, you are correct that an error is still raised, and please disregard my ROI script above.

My new response is as follows:

Your suggested edit will indeed work for this case, but will not work for arbitrary ROIs due to downstream calculation effects. Consider for example the following ROI:

roi = [0, 1, 0, 1, 1, 1, 1, ...];

i.e., the first and third nodes are excluded from the ROI, and there are zero-variance nodes at these two nodes.

One problem is that smoothness cannot be estimated at node 2, because temporal derivates cannot be estimated at single, isolated points.

More generally, there are two types of ROI to consider:

A proper ROI: this simply masks a continuous process.
A hack ROI: this masks the data to hide problems like zero variance nodes.

For a proper ROI, the roi specified above is no problem: temporal derivatives can still be estimated at the specified point because the neighboring points are numerically sound.

For a hack ROI, the derivative estimate can blow up due to zero variance, as you correctly note in your first post of this thread. Similar numerical problems would be encountered with NaN,inf and complex values.

Proper support for hack ROIs could indeed be possible, but would require a major rewrite of the code (to support zero variance, NaN, inf, complex values, etc.), together with extensive simulation scripts involving arbitrary ROIs to demonstrate numerical validity of the new procedures. I am reluctant to implement this solution, partly because it will take a long time to implement properly, and partly because it could encourage poor practice: one could simply use ROIs to hide dataset problems.

I therefore prefer the current behavior: return an error when there are zero-variance nodes, irrespective of whether an ROI is specified.

0todd0000 commented 2 years ago

Please note: I have deleted my script from this thread so that others are not tempted to implement something similar.

DCHartlen commented 2 years ago

Alright, I understand. Thank you for your detailed responses in this matter. They have been very helpful in understanding spm1d more deeply.

0todd0000 / spm1dmatlab

FWHM Calculation with Zero Variance Nodes #165