Closed KirstensGitHub closed 4 years ago
CHECK A1-A2: Novel Image Label, Code Check
CHECK A3: Novel Image Label, Manual Check
exp1_16_0_novelTest_colored.xlsx exp2_33_4_novelTest_colored.xlsx
CHECK B Check our sliding window function, and check theoretical with toy dataset
CHECK C
Check significant difference plotting at each timepoint for sliding windows
previous sliding window function:
def apply_window(combo, window_length): ''' input: dataframe of behavioral data from an entire experiment output: dataframe of same shape where raw values have been replaced by rolling window mean '''
# select data from memory runs
data = combo[combo['Trial Type']=='Memory'][['Attention Level','Familiarity Rating','Trial','Subject','Run']]
# re-structure the data - each row is a trial, each column is an attn level
df = data.pivot_table(index=['Subject','Run','Trial'], columns='Attention Level', values='Familiarity Rating')
# apply rolling window, for each run in each subject
window_data = df.groupby(['Subject','Run']).apply(lambda x: x.rolling(window_length, min_periods=1, center=True).mean())
return(window_data)
updated window function:
`def apply_window(combo, window_length): ''' input: dataframe of behavioral data from an entire experiment output: dataframe of same shape where raw values have been replaced by rolling window mean '''
# select data from memory runs
data = combo[combo['Trial Type']=='Memory'][['Attention Level','Familiarity
Rating','Trial','Subject','Run']]
# re-structure the data - each row is a trial, each column is an attn level
df = data.pivot_table(index=['Subject', 'Trial'], columns='Attention Level', values='Familiarity Rating')
# apply rolling window, for each run in each subject
window_data = df.groupby(['Subject']).apply(lambda x: x.rolling(window_length, min_periods=1, center=True).mean())
return(window_data)`
CHECK D
plots showing timecourses and violins together, for visual comparison:
CHECK D Numerical comparisons of means from sliding windows and raw data
Here I calculate the mean familiarity ratings from the raw data, as well as the mean familiarity from the timeseries data with sliding windows of sizes 1, 5, and 20.
We see that, using our code to apply a sliding window of size 1, then averaging, yields the same mean as the mean from the raw data, but larger window sizes yield means that are slightly, increasingly different. This shows that the code implementing the window isn't inherently changing the data in an undesired way (given that window size of 1 yields the same mean values), but that the actual act of averaging over the sliding window yields a mean that is not identical.
CHECK D
toy data showing that average of sliding windows and average of raw data are not identical (start section 5: Show that the mean..."): Sliding_Window_Check_1 (4).pdf
Here, we are checking our sliding window plots. This involves checking the data to make sure the behavioral files have the correct information (Check 0, CHECK A), making sure our sliding windows average correctly over the data (Check B), and verifying that our timepoint-by-timepoint ttests correctly identify points when the timecourses are significantly different (CHECK C).
CHECKS:
CHECK 0: behavioral data check (see issue #77)
CHECK A: novel_label_check.pdf
CHECK B :
Sliding_Window_Check_1 (1)(1).pdf).
CHECK C: timepoint_ttest_2-1.pdf
CHECK D: see comments
FINDINGS / CHANGES:
For sliding window plots, first take the mean for each position in the memory trial across all runs (by subject), then apply the sliding window, then plot (as opposed to first applying the window over each run, then averaging). Plots change very minimally.
Optional update: change code so that parts of the plot at which only one timepoint has a significant difference show a dot instead of a very short (.02 width) line over the timepoint. ( @jeremymanning I find the former to be more visually consistent / easier to see, but perhaps a dot is more accurate..?)