testing: synthetic data figures

jeremymanning commented 7 years ago

Generate figures for two synthetic datasets:

dataset 1: data drawn in 10 blocks of 100 samples, each with a different correlation matrix.
dataset 2: every sample gets a different correlation matrix, but give the dataset structure by gradually fading from one correlation matrix to another (the same analysis you had done with the time-varying covariance version, but with correlation matrices)

then use timecorr to recover the correlations. then generate four plots for each dataset and upload here in response to this issue:

1) same plots as you were making before-- correlations between the recovered correlation matrix and the actual correlation matrices. for the blocked version, show 10 lines (one per correlation matrix) and show that each block’s line fades in for that block, and then fades out when the block is done. for the ramping version, show 2 lines (one for the first timepoint’s correlation matrix and one for the last timepoint’s), and show that the two lines cross. purpose: show that we recover the temporal structure of the dataset.

2) alternative version of plot 1: instead of correlation, show the mean squared error between the recovered matrices and true matrices (the plot should have the same format — same number of lines — but the y-axis is now showing MSE instead of correlation). purpose: show that we are recovering the magnitudes of the correlations, not just the shapes of the correlation matrices.

3) just a single line-- for each timepoint, compute the correlation between the recovered correlation matrix at that timepoint and then true correlation matrix (for that timepoint only). the main difference from plot 1 will be for the ramping dataset. rather than showing crossing lines due to the recovered correlation matrices getting gradually more/less similar to the first and last correlation matrices, you should instead see a flat line showing the per-timepoint recovery.

4) MSE version of plot 3. (same plot, but show MSE rather than correlation on the y-axis)

TomHaoChang commented 7 years ago

Hi Professor Manning,

I responded in the slack group, but I will write everything here as well.

For MSE going over 4, I found the sum of squares for every element wise diference between the true correlation and the recovered correlation. Because there are 5*5=25 elements, I think it makes sense that MSE can go over 4. I just realized, I should have used mean. Will fix that right away
All the tests were generated using the cholesky method
For the plot with colored lines, I thought it would look less messy if I plotted in segments. Plotting through the entire 1000 samples would actually be easier. I will implement that
I will implement the fisher-Z-transformation
Do you still want me to include sliding window in the comparison between timecorr and old timecorr?

jeremymanning commented 7 years ago

Hey @TomHaoChang-- responses below:

I think you're computing MSE incorrectly. If there are 25 elements, each squared error can range from 0 to 4. So if you sum together the errors, the range of that sum has to be between 0 and 100. If you divide that by 25 to take the mean, you'll get something between 0 and 4, inclusive. (Couldn't tell from your response if you realized this or if you're still confused about it.)
Re: cholesky method-- great. Please keep generating data using correlations rather than covariances, and use timecorr to estimate correlations (rather than covariances).
There are two informative aspects of the plots with the colored lines. First, we want to know that the correlations recovered in each block match the true correlations for those blocks. (Your version shows us this.) Second, we want to know that the correlations recovered outside of each block do not match the correlations for those blocks. This gives us a sense of how correlation matrices match by chance for this dataset. (Your version leaves this second part out.)
Great; the Fisher z-transformation will give us more accurate information
I forgot about the comparisons between the sliding window version and timecorr. I think that would still be valuable-- but I'd like to see them on the same plot (with the x-axis for the sliding window version shifted to line up with the timecorr version). You could make the sliding window lines have the same colors as the timecorr lines, but with alpha=0.5 (i.e. so the sliding window versions will appear as a lighter shade on the same plot).

jeremymanning commented 7 years ago

To clarify number 5 (previous comment): I think you should plot the sliding window and timecorr plots on the same axis (rather than as separate sub-plots) so that we can more easily compare the shapes of the curves. Also, can you upload the figures to this issue rather than posting to slack? (Trying to get us to consolidate our notes...)

TomHaoChang commented 7 years ago

To further clarify number 5, do you want me to plot sliding window and timecorr on the same plot, and old timecorr in a subplot right next to them?

In addition, is variance=1000, samples=10 and sliding window = 51 a good set of parameter to use?

If not, how should I display variations in parameters

jeremymanning commented 7 years ago

I'd make separate sets of plots for new and old timecorr, both with the sliding window version on the same axis. Regarding the parameters, you'll need to explore those a bit to see how to match the two methods.

TomHaoChang commented 7 years ago

Hi Professor Manning, before I start coding everything in again, I just want to confirm the equations for old timecorr again: where Is this correct? Or do we want to make some modifications?

jeremymanning commented 7 years ago

the equations aren't quite right:

Z doesn't equal T-- that normal distribution doesn't sum to one over the range being summed over. so you actually have to calculate that sum.
the \bar{v}_t^i equation should be normalized by Z, not T
the \sigma_t^i equation needs a 1/Z in front of the sum (but still inside the square root)

TomHaoChang commented 7 years ago

I enforced the condition that the sum of the gaussian coefficients is equal to 1 by dividing every coefficient by the sum. Should I not have done that?

jeremymanning commented 7 years ago

not sure...can you update the equations accordingly? i didn't see that in the equations you wrote above.

TomHaoChang commented 7 years ago

where

jeremymanning commented 7 years ago

there are still two issues:

Z isn't equal to T
you still need a 1/Z term in front of the sum inside the squareroot for the \sigma_t^i equation

TomHaoChang commented 7 years ago

How about:

where

jeremymanning commented 7 years ago

Please check over the next version very very carefully before you upload it-- it's really frustrating to keep giving you the same feedback! Here's what's still wrong:

The new normal equation you posted doesn't actually sum to 1-- i.e. \sum_s=0^T N(s|t,\sigma) does not equal 1. The area under the Normal is equal to 1 only if you sum over -\infty to \infty. You're not doing that here.
Z (still) isn't equal to T

jeremymanning commented 7 years ago

hey @TomHaoChang i'm sorry for getting frustrated...i should have read your note more carefully before responding. i now see that N and \mathcal{N} are different. my fault.

however, i think you should change N to \mathcal{N} (i.e. use the standard Normal distribution) and make the changes i had described initially (above). the current notation is confusing...and i'm not sure it's correct. but if you switch to the standard Normal distribution i'll be more confident about it.

TomHaoChang commented 7 years ago

Does this look okay?

where

jeremymanning commented 7 years ago

not quite still-- i think this is what you want:

editable equations here

jeremymanning commented 7 years ago

sorry...one sec, mine is wrong

jeremymanning commented 7 years ago

corrected:

so it's a little different from what you uploaded

jeremymanning commented 7 years ago

ugh...you know what, i think your latest version is basically correct. time for me to retire, i think. mine is still wrong-- let me correct it again.

jeremymanning commented 7 years ago

ok...i'm now confusing myself. i think this is right:

i'm pretty sure we don't need to multiply Z by by T, but i'm now questioning everything. can you try implementing the "old" version using these equations and we'll correct as needed? or if you notice an obvious error, let me know that too.

jeremymanning commented 7 years ago

the key difference from what you wrote is in equation 2-- first of all, $Z$ depends on the timepoint being reconstructed, so it needs to be recomputed for every timepoint (hence the subscript $t$). second, there's no T in front of that sum-- Z_t is just the sum of the weights from the Normal distribution.

My intuition is that in a parallel universe if we defined a new N(s|t,\sigma) = 1 then equations 1, 3, and 4 would be standard averages. But instead we're computing expectations with respect to the Normal distribution evaluated at the integers between 0 and T, inclusive.

TomHaoChang commented 7 years ago

Okay, I think I agree with this version of the equation. It makes sense intuitively

jeremymanning commented 7 years ago

:+1: looking forward to seeing how it goes!

jeremymanning commented 7 years ago

so the main difference with the "new" method is in how we estimate \sigma_t^i and \sigma_t^j, right? i.e. in the "new" method we're just using a block of 3 timepoints, whereas in the "old" method we're using a Gaussian weighted average. is that correct?

jeremymanning commented 7 years ago

our estimate of those standard deviations follows from the formula for expected standard deviation: https://wikimedia.org/api/rest_v1/media/math/render/svg/5a0e3e2724af7c91cea2d6a7a2fc0d17be086d78

source: https://en.wikipedia.org/wiki/Standard_deviation

jeremymanning commented 7 years ago

so now i'm pretty sure the correctly implemented "old" method should work at least as well as the "new" method...

TomHaoChang commented 7 years ago

Yes, the new timecorr method estimates the correlation fragment at each time point with the standard correlation equation using a block of 3 timepoints, and then applying a gaussian averaging over the correlation fragments.

The old timecorr method skips over the correlation fragment estimation step and directly applies gaussian averaging in the correlation function, with modifications to the mean and the standard deviation.

Both methods makes sense intuitively, but we need to test them out to see.

TomHaoChang commented 7 years ago

BTW are the four graphs for the new timecorr. Block correlation: variance=1000, samples=10, sliding window=51 download

Ramp correlation: variance=1000, samples=10, sliding window=51 download 1

jeremymanning commented 7 years ago

those look fantastic!!

jeremymanning commented 7 years ago

if i'm understanding correctly, those figures show that:

timecorr largely agrees with sliding window estimates
both methods (timecorr and sliding windows) recover correlations that are about equally correlated with the ground truth correlations
timecorr recovers correlations that more closely match the magnitude of the ground truth correlations (i.e. lower MSE than using sliding windows)

TomHaoChang commented 7 years ago

yep, and also timecorr handles sudden transitions much better than sliding window, along with the added benefit of not needing to cut off important data on the sides

jeremymanning commented 7 years ago

Nice. Those are great benefits too that will also allow us to do levelup analyses.

TomHaoChang commented 7 years ago

Block correlation: variance=1000, samples=10, sliding window=51

Ramp correlation: variance=1000, samples=10, sliding window=51

Old timecorr is performing almost as well as the new timecorr. It seems like timecorr has better MSE while old timecorr has slightly better correlation. I want to change sample size around a bit and see what happens

jeremymanning commented 7 years ago

Great! Given that the "old" formula is more defensible, has one fewer parameter, and performs as well or better than the "new" formula, I think we have a good case for going with the old version. So let's switch to the old version from here on. It would be useful to play around with the sigma parameter to see how that interacts with recovery for both synthetic datasets.

jeremymanning commented 7 years ago

(I'd also get rid of the samples parameter from the timecorr API.)

TomHaoChang commented 7 years ago

I think it's worth discussing if MSE or correlation is a better indicator of goodness of recovery. Having a higher correlation with the ground truth means that the overall structure of the data looks more similar with the ground truth. However, having a higher MSE means that the actual values of the recovered MSE are closer to the ground truth.

Looking across the graphs, it looks like although the old timecorr does slightly better in the correlation graph, the new timecorr consistently achieves better MSE. I also want to play around with the number of samples to see how much of a difference that will make on the correlation and the MSE. Let's make a decision after I get the results for sample size = 3.

Is it also a possibility to include both methods in my thesis?

jeremymanning commented 7 years ago

Correlation will be more important for decoding, so I think we should stick with that version. Also, the "new" version is problematic in that the formula is not as defensible and it doesn't perform reliably better. If you look at the magnitudes if the differences in performance, the MSE values for the new version are slightly lower, but the correlation values for the old version are consistently much higher than anything else. So overall the old version seems to perform better, and is also easier to defend, as it is derived directly from the correlation formula. I would recommend just including the old version, and scraping the new version.

jeremymanning commented 7 years ago

Or, if you are fixated on the new version, you would need to find a scenario where it works obviously better. But I think a better use of time would be to explore the old version with various settings of sigma.

TomHaoChang commented 7 years ago

Here are the results for block correlation when sample=3

I agree it makes more sense to choose the old timecorr. One thing I am a little worried about is the amount of content I will have for my thesis. After getting rid of the new timecorr and gaussian process, a large portion of my current thesis will be gone, and I am not really sure how to fill that up. I think Professor Farid is expecting around 50 pages....

jeremymanning commented 7 years ago

you can easily get 50 pages using the "old" method-- but first let's worry about stabilizing on the method. you'll need something like 5-10 pages for the introduction, 5-10 pages for the methods, probably 20-30 pages for the results (synthetic data parameter recovery, performance benchmarking, and for 3 fMRI datasets), and 5-10 pages for the discussion. if you need more content, we can always go more in depth on the analyses. (it doesn't make sense to use a less defensible method just to fill up space.)

TomHaoChang commented 7 years ago

Okay, got it. Writing has always been one of my greatest weak points, and I have never written anything more than 20 pages. So 50 pages is very intimidating for me haha.

Should I do a comparison between different variances for the "old" timecorr (henceforth referred as the official timecorr method)? Maybe for values of 10, 50, 100, 500, 1000 and 10000?

jeremymanning commented 7 years ago

Sure, that sounds good. Once you upload those figures I think you could close this issue.

And the writing will be ok. It's good information to know that you're worried about it--so in that case let's make sure to make steady progress on the writing and start early so we have plenty of time for revisions and feedback. I've added issues for each section; take a look at the writing issues and once you have something or get stuck I'm happy to provide comments or discuss whenever you're ready to.

TomHaoChang commented 7 years ago

Block correlation:

Ramping correlation:

jeremymanning commented 7 years ago

It looks like 1000 gives a good balance-- fast transitions while also giving good overall correlations and low MSE. What do you think?

TomHaoChang commented 7 years ago

Yeah, I think so too. I am now running tests on 250, 500 and 750 to see if I can find a better balance. Will close this issue once I finish. Does closing an issue delete the comments in this thread?

jeremymanning commented 7 years ago

Sounds great-- want to also add 1250 and 1500? Closing the issue does not remove the comments, and we can always reopen it later if needed.

jeremymanning commented 7 years ago

Also: you should update the code to reflect the corrected "old" algorithm, and also upload an ipython notebook to generate these example figures. Maybe start an "example" folder to help us keep track of everything, and add a comment to this issue with a link to the notebook.

TomHaoChang commented 7 years ago

For testing, there is no need to do the transpose operation when applying the inverse Fisher Z transformation right? Since theoretically we are just finding the average of the correlation between the recovered data and ground truth data? It also doesn't make sense to do the transpose addition because the correlation matrix we recover is not square but a one-dimensional matrix of length time_length. There's no symmetric structure to this matrix, so we can probably just use the original inverse Fisher Z transformation on the wikipedia page instead?

jeremymanning commented 7 years ago

Can you unpack your comment? I don't understand what you're asking. But if you want to average correlations, you first need to use the Fisher z-transformation to convert correlation coefficients (r values) to z-transformed r values (i.e. z values), then average the z values, and then take the inverse Fisher z-transformation of the average z value.

I posted formulae for r2z (the Fisher z-transformation) and z2r (its inverse) on Slack a few days ago.

TomHaoChang commented 7 years ago

Sorry, let me be more specific.

Right now I am reviewing the process with which we average the correlation between recovered correlation matrix and the true correlation matrix. The matrix at hand is a one-dimensional matrix of length num_of_timepoints, where each element represents the correlation between the recovered correlation matrix and the true correlation matrix at the timepoint. I followed the formula you posted a few days ago where I apply Fisher Z-transformation, then average the z values, and then take the inverse Fisher z-transformation of the average z values.

The question I want to ask is: the (K+transpose(K))/(2*S) substitution for 2z that we discussed when applying ISFC is not relevant here right?

ContextLab / timecorr

testing: synthetic data figures #2