Difference in noise levels across ROIs is biasing results [12/03/20 - Ongoing]

dawnfinzi commented 3 years ago

When comparing (correlating - Pearson's r) RSMs across ROIs, the comparison is necessarily biased by the level of signal vs noise in the data. For example, if we have two candidate "models" A and B, and one "target" dataset C, if A is super noisy while B is not, A may be less highly correlated with C even if A is a better model of C's responses than B. This seems to particularly a problem for our ROIs as the split-half reliability for the ventral ROI is consistently higher than for the lateral ROI within subject, indicating a difference in noise levels. RSMcorr_ventralVSlateral_voxThresh20_abs_zscore Correlation within and across ventral and lateral ROIs for each subject, 20% variance explained threshold

dawnfinzi commented 3 years ago

Suggested fix (from meeting with Kendrick - 12/02/20)

Disattenuation: correcting for the noisiness of the candidate models (ROIs) Say we have a dataset that can be divided into data1 and data2. The split-half correlation between data1 and data2 is equal to the square root of the variance that is shared across data1 and data2, where the variance that is shared across data1 and data2 = ((NC_data1/100) (NC_data2/100)). In general, the noise ceiling is defined as: NC = 100 (signal_SD^2 / (signal_SD^2 + noise_SD^2)). We can test this with the following simulation

results = [];
for rep=1:500
  signalsd = rand * 5;
  noisesd = rand * 5;
  signal = signalsd*randn(1,10000);
  data1= signal + noisesd*randn(1,10000);
  data2= signal + noisesd*randn(1,10000);
  r = corr(data1(:),data2(:));
  nc = 100 * (signalsd^2 / (signalsd^2 + noisesd^2));  % percentage of data1 that is a signal
  %(nc/100)*(nc/100)   % fraction of variance that is shared across data1 and data2
  %sqrt((nc/100)*(nc/100))   % the Pearson r that is expected to obtain between data1 and data2
  results(rep,1) = r;
  results(rep,2) = sqrt((nc/100)*(nc/100));
end
figure;
scatter(results(:,1),results(:,2));

which produces this plot

showing that the split-half correlation is equal to the square root of the shared variance in the limit.

Based on this, we know that observed_r for model_1 to target = sqrt((NC_target/100)(NC_model/100)). We want to correct for the noise level of the model to allow for a fair comparison between models. Thus, we see that adjusted_r = sqrt(observed_r.^2 / (NC_model/100)) adjusted_r = sqrt(observed_r.^2 (100/NC_model)) adjusted_r = observed_r * sqrt(100/NC_model)

This gives us the following algorithm to compensate for the noisiness of the candidate model (i.e. disattenuate).

#for one case (matrix element) in the mega matrix
#ex. column (target) is ROI1 repeat1, row (model candidate) is ROI2 repeatx where x != 2

NC = ROI2's average split-half reliability across the 3 pairs of repeats * 100 #note that NC = split-half reliability*100 in limit (see other simulation)

observed_r = pearsons_r( flattened(lower_triangle_of_RSM) for ROI1r1, flattened(lower_triangle_of_RSM) for ROI2rx )
adjusted_r = observed_r * sqrt(100/NC)

# there are six cases for a single matrix element! 
#(ROI2r1 -> ROI1r3, ROI2r2 -> ROI1r3, ROI2r1 -> ROI1r2, ROI2r3 -> ROI1r2, ROI2r3 -> ROI1r1, ROI2r2 -> ROI1r1) 
#so do it six times and average

# for across subjects, there are nine cases but everything else is fundamentally the same

dawnfinzi commented 3 years ago

Doing this correction for the rows (model candidate) produces this: megaMatrix_voxThresh20_adjusted_rh But really we should be doing this for row (model) and column (target) in order to compare

dawnfinzi commented 3 years ago

Pseudocode logic/algorithm is

Row is from any ROI any subject. Row_1, Row_2, Row_3 are the three lower triangles (r units).
Column is from any ROI any subject. Col_1, Col_2, Col_3 are the three lower triangles (r units).

r_ROI = average of [Row_i<->Row_j, i ~= j]
The noise ceiling (NC) is 100*r_ROI. This is the percentage of variance in "one-trial data".

For any element in the megamatrix, we pull
  Row_i and Col_j where (i~=j if from same subject).
  observed_r = corr(Row_i,Col_j).
  avg_observed_r = average across all possible valid cases.
The magical adjustment is:
  adjusted_r = avg_observed_r * sqrt(100/NC_row) * sqrt(100/NC_col).

Consider the case of the element on the diagonal of the mega matrix.
  adjusted_r = avg_observed_r * sqrt(100/(100*avg_observed_r)) ^ 2 = 1

In code this is https://github.com/dawnfinzi/streams/blob/e00cee8cbf766bc1f2fd217566f37eba2feae6c4/notebooks/Mega_matrix_disattentuation.ipynb#L177-L211 which produces megaMatrix_voxThresh20_adjusted_both_rh

dawnfinzi commented 3 years ago

To visualize the root of this problem across ROIs, here is the average RSM correlation (Pearson's r) across repeats (and averaged across subjects) for all right hemisphere ROIs. Notice the large discrepancy in these correlations between the ventral ROI and the other higher level visual ROIs sh_RSM_withinROI_withinSubj_voxThresh20_rh

dawnfinzi / streams

Difference in noise levels across ROIs is biasing results [12/03/20 - Ongoing] #1

Suggested fix (from meeting with Kendrick - 12/02/20)