Closed dawnfinzi closed 3 years ago
Disattenuation: correcting for the noisiness of the candidate models (ROIs) Say we have a dataset that can be divided into data1 and data2. The split-half correlation between data1 and data2 is equal to the square root of the variance that is shared across data1 and data2, where the variance that is shared across data1 and data2 = ((NC_data1/100) (NC_data2/100)). In general, the noise ceiling is defined as: NC = 100 (signal_SD^2 / (signal_SD^2 + noise_SD^2)). We can test this with the following simulation
results = [];
for rep=1:500
signalsd = rand * 5;
noisesd = rand * 5;
signal = signalsd*randn(1,10000);
data1= signal + noisesd*randn(1,10000);
data2= signal + noisesd*randn(1,10000);
r = corr(data1(:),data2(:));
nc = 100 * (signalsd^2 / (signalsd^2 + noisesd^2)); % percentage of data1 that is a signal
%(nc/100)*(nc/100) % fraction of variance that is shared across data1 and data2
%sqrt((nc/100)*(nc/100)) % the Pearson r that is expected to obtain between data1 and data2
results(rep,1) = r;
results(rep,2) = sqrt((nc/100)*(nc/100));
end
figure;
scatter(results(:,1),results(:,2));
which produces this plot
showing that the split-half correlation is equal to the square root of the shared variance in the limit.
Based on this, we know that observed_r for model_1 to target = sqrt((NC_target/100)(NC_model/100)). We want to correct for the noise level of the model to allow for a fair comparison between models. Thus, we see that adjusted_r = sqrt(observed_r.^2 / (NC_model/100)) adjusted_r = sqrt(observed_r.^2 (100/NC_model)) adjusted_r = observed_r * sqrt(100/NC_model)
This gives us the following algorithm to compensate for the noisiness of the candidate model (i.e. disattenuate).
#for one case (matrix element) in the mega matrix
#ex. column (target) is ROI1 repeat1, row (model candidate) is ROI2 repeatx where x != 2
NC = ROI2's average split-half reliability across the 3 pairs of repeats * 100 #note that NC = split-half reliability*100 in limit (see other simulation)
observed_r = pearsons_r( flattened(lower_triangle_of_RSM) for ROI1r1, flattened(lower_triangle_of_RSM) for ROI2rx )
adjusted_r = observed_r * sqrt(100/NC)
# there are six cases for a single matrix element!
#(ROI2r1 -> ROI1r3, ROI2r2 -> ROI1r3, ROI2r1 -> ROI1r2, ROI2r3 -> ROI1r2, ROI2r3 -> ROI1r1, ROI2r2 -> ROI1r1)
#so do it six times and average
# for across subjects, there are nine cases but everything else is fundamentally the same
Doing this correction for the rows (model candidate) produces this: But really we should be doing this for row (model) and column (target) in order to compare
Pseudocode logic/algorithm is
Row is from any ROI any subject. Row_1, Row_2, Row_3 are the three lower triangles (r units).
Column is from any ROI any subject. Col_1, Col_2, Col_3 are the three lower triangles (r units).
r_ROI = average of [Row_i<->Row_j, i ~= j]
The noise ceiling (NC) is 100*r_ROI. This is the percentage of variance in "one-trial data".
For any element in the megamatrix, we pull
Row_i and Col_j where (i~=j if from same subject).
observed_r = corr(Row_i,Col_j).
avg_observed_r = average across all possible valid cases.
The magical adjustment is:
adjusted_r = avg_observed_r * sqrt(100/NC_row) * sqrt(100/NC_col).
Consider the case of the element on the diagonal of the mega matrix.
adjusted_r = avg_observed_r * sqrt(100/(100*avg_observed_r)) ^ 2 = 1
In code this is https://github.com/dawnfinzi/streams/blob/e00cee8cbf766bc1f2fd217566f37eba2feae6c4/notebooks/Mega_matrix_disattentuation.ipynb#L177-L211 which produces
To visualize the root of this problem across ROIs, here is the average RSM correlation (Pearson's r) across repeats (and averaged across subjects) for all right hemisphere ROIs. Notice the large discrepancy in these correlations between the ventral ROI and the other higher level visual ROIs
When comparing (correlating - Pearson's r) RSMs across ROIs, the comparison is necessarily biased by the level of signal vs noise in the data. For example, if we have two candidate "models" A and B, and one "target" dataset C, if A is super noisy while B is not, A may be less highly correlated with C even if A is a better model of C's responses than B. This seems to particularly a problem for our ROIs as the split-half reliability for the ventral ROI is consistently higher than for the lateral ROI within subject, indicating a difference in noise levels. Correlation within and across ventral and lateral ROIs for each subject, 20% variance explained threshold