Closed jeremymanning closed 7 years ago
Hi Professor Manning,
I responded in the slack group, but I will write everything here as well.
Hey @TomHaoChang-- responses below:
I think you're computing MSE incorrectly. If there are 25 elements, each squared error can range from 0 to 4. So if you sum together the errors, the range of that sum has to be between 0 and 100. If you divide that by 25 to take the mean, you'll get something between 0 and 4, inclusive. (Couldn't tell from your response if you realized this or if you're still confused about it.)
Re: cholesky method-- great. Please keep generating data using correlations rather than covariances, and use timecorr to estimate correlations (rather than covariances).
There are two informative aspects of the plots with the colored lines. First, we want to know that the correlations recovered in each block match the true correlations for those blocks. (Your version shows us this.) Second, we want to know that the correlations recovered outside of each block do not match the correlations for those blocks. This gives us a sense of how correlation matrices match by chance for this dataset. (Your version leaves this second part out.)
Great; the Fisher z-transformation will give us more accurate information
I forgot about the comparisons between the sliding window version and timecorr. I think that would still be valuable-- but I'd like to see them on the same plot (with the x-axis for the sliding window version shifted to line up with the timecorr version). You could make the sliding window lines have the same colors as the timecorr lines, but with alpha=0.5 (i.e. so the sliding window versions will appear as a lighter shade on the same plot).
To clarify number 5 (previous comment): I think you should plot the sliding window and timecorr plots on the same axis (rather than as separate sub-plots) so that we can more easily compare the shapes of the curves. Also, can you upload the figures to this issue rather than posting to slack? (Trying to get us to consolidate our notes...)
To further clarify number 5, do you want me to plot sliding window and timecorr on the same plot, and old timecorr in a subplot right next to them?
In addition, is variance=1000, samples=10 and sliding window = 51 a good set of parameter to use?
If not, how should I display variations in parameters
I'd make separate sets of plots for new and old timecorr, both with the sliding window version on the same axis. Regarding the parameters, you'll need to explore those a bit to see how to match the two methods.
Hi Professor Manning, before I start coding everything in again, I just want to confirm the equations for old timecorr again: where Is this correct? Or do we want to make some modifications?
the equations aren't quite right:
I enforced the condition that the sum of the gaussian coefficients is equal to 1 by dividing every coefficient by the sum. Should I not have done that?
not sure...can you update the equations accordingly? i didn't see that in the equations you wrote above.
where
there are still two issues:
How about:
where
Please check over the next version very very carefully before you upload it-- it's really frustrating to keep giving you the same feedback! Here's what's still wrong:
hey @TomHaoChang i'm sorry for getting frustrated...i should have read your note more carefully before responding. i now see that N and \mathcal{N} are different. my fault.
however, i think you should change N to \mathcal{N} (i.e. use the standard Normal distribution) and make the changes i had described initially (above). the current notation is confusing...and i'm not sure it's correct. but if you switch to the standard Normal distribution i'll be more confident about it.
Does this look okay?
where
not quite still-- i think this is what you want:
editable equations here
sorry...one sec, mine is wrong
corrected:
so it's a little different from what you uploaded
ugh...you know what, i think your latest version is basically correct. time for me to retire, i think. mine is still wrong-- let me correct it again.
ok...i'm now confusing myself. i think this is right:
i'm pretty sure we don't need to multiply Z by by T, but i'm now questioning everything. can you try implementing the "old" version using these equations and we'll correct as needed? or if you notice an obvious error, let me know that too.
the key difference from what you wrote is in equation 2-- first of all, $Z$ depends on the timepoint being reconstructed, so it needs to be recomputed for every timepoint (hence the subscript $t$). second, there's no T in front of that sum-- Z_t is just the sum of the weights from the Normal distribution.
My intuition is that in a parallel universe if we defined a new N(s|t,\sigma) = 1 then equations 1, 3, and 4 would be standard averages. But instead we're computing expectations with respect to the Normal distribution evaluated at the integers between 0 and T, inclusive.
Okay, I think I agree with this version of the equation. It makes sense intuitively
:+1: looking forward to seeing how it goes!
so the main difference with the "new" method is in how we estimate \sigma_t^i and \sigma_t^j, right? i.e. in the "new" method we're just using a block of 3 timepoints, whereas in the "old" method we're using a Gaussian weighted average. is that correct?
our estimate of those standard deviations follows from the formula for expected standard deviation: https://wikimedia.org/api/rest_v1/media/math/render/svg/5a0e3e2724af7c91cea2d6a7a2fc0d17be086d78
so now i'm pretty sure the correctly implemented "old" method should work at least as well as the "new" method...
Yes, the new timecorr method estimates the correlation fragment at each time point with the standard correlation equation using a block of 3 timepoints, and then applying a gaussian averaging over the correlation fragments.
The old timecorr method skips over the correlation fragment estimation step and directly applies gaussian averaging in the correlation function, with modifications to the mean and the standard deviation.
Both methods makes sense intuitively, but we need to test them out to see.
BTW are the four graphs for the new timecorr. Block correlation: variance=1000, samples=10, sliding window=51
Ramp correlation: variance=1000, samples=10, sliding window=51
those look fantastic!!
if i'm understanding correctly, those figures show that:
yep, and also timecorr handles sudden transitions much better than sliding window, along with the added benefit of not needing to cut off important data on the sides
Nice. Those are great benefits too that will also allow us to do levelup analyses.
Block correlation: variance=1000, samples=10, sliding window=51
Ramp correlation: variance=1000, samples=10, sliding window=51
Old timecorr is performing almost as well as the new timecorr. It seems like timecorr has better MSE while old timecorr has slightly better correlation. I want to change sample size around a bit and see what happens
Great! Given that the "old" formula is more defensible, has one fewer parameter, and performs as well or better than the "new" formula, I think we have a good case for going with the old version. So let's switch to the old version from here on. It would be useful to play around with the sigma parameter to see how that interacts with recovery for both synthetic datasets.
(I'd also get rid of the samples parameter from the timecorr API.)
I think it's worth discussing if MSE or correlation is a better indicator of goodness of recovery. Having a higher correlation with the ground truth means that the overall structure of the data looks more similar with the ground truth. However, having a higher MSE means that the actual values of the recovered MSE are closer to the ground truth.
Looking across the graphs, it looks like although the old timecorr does slightly better in the correlation graph, the new timecorr consistently achieves better MSE. I also want to play around with the number of samples to see how much of a difference that will make on the correlation and the MSE. Let's make a decision after I get the results for sample size = 3.
Is it also a possibility to include both methods in my thesis?
Correlation will be more important for decoding, so I think we should stick with that version. Also, the "new" version is problematic in that the formula is not as defensible and it doesn't perform reliably better. If you look at the magnitudes if the differences in performance, the MSE values for the new version are slightly lower, but the correlation values for the old version are consistently much higher than anything else. So overall the old version seems to perform better, and is also easier to defend, as it is derived directly from the correlation formula. I would recommend just including the old version, and scraping the new version.
Or, if you are fixated on the new version, you would need to find a scenario where it works obviously better. But I think a better use of time would be to explore the old version with various settings of sigma.
Here are the results for block correlation when sample=3
I agree it makes more sense to choose the old timecorr. One thing I am a little worried about is the amount of content I will have for my thesis. After getting rid of the new timecorr and gaussian process, a large portion of my current thesis will be gone, and I am not really sure how to fill that up. I think Professor Farid is expecting around 50 pages....
you can easily get 50 pages using the "old" method-- but first let's worry about stabilizing on the method. you'll need something like 5-10 pages for the introduction, 5-10 pages for the methods, probably 20-30 pages for the results (synthetic data parameter recovery, performance benchmarking, and for 3 fMRI datasets), and 5-10 pages for the discussion. if you need more content, we can always go more in depth on the analyses. (it doesn't make sense to use a less defensible method just to fill up space.)
Okay, got it. Writing has always been one of my greatest weak points, and I have never written anything more than 20 pages. So 50 pages is very intimidating for me haha.
Should I do a comparison between different variances for the "old" timecorr (henceforth referred as the official timecorr method)? Maybe for values of 10, 50, 100, 500, 1000 and 10000?
Sure, that sounds good. Once you upload those figures I think you could close this issue.
And the writing will be ok. It's good information to know that you're worried about it--so in that case let's make sure to make steady progress on the writing and start early so we have plenty of time for revisions and feedback. I've added issues for each section; take a look at the writing issues and once you have something or get stuck I'm happy to provide comments or discuss whenever you're ready to.
Block correlation:
Ramping correlation:
It looks like 1000 gives a good balance-- fast transitions while also giving good overall correlations and low MSE. What do you think?
Yeah, I think so too. I am now running tests on 250, 500 and 750 to see if I can find a better balance. Will close this issue once I finish. Does closing an issue delete the comments in this thread?
Sounds great-- want to also add 1250 and 1500? Closing the issue does not remove the comments, and we can always reopen it later if needed.
Also: you should update the code to reflect the corrected "old" algorithm, and also upload an ipython notebook to generate these example figures. Maybe start an "example" folder to help us keep track of everything, and add a comment to this issue with a link to the notebook.
For testing, there is no need to do the transpose operation when applying the inverse Fisher Z transformation right? Since theoretically we are just finding the average of the correlation between the recovered data and ground truth data? It also doesn't make sense to do the transpose addition because the correlation matrix we recover is not square but a one-dimensional matrix of length time_length. There's no symmetric structure to this matrix, so we can probably just use the original inverse Fisher Z transformation on the wikipedia page instead?
Can you unpack your comment? I don't understand what you're asking. But if you want to average correlations, you first need to use the Fisher z-transformation to convert correlation coefficients (r values) to z-transformed r values (i.e. z values), then average the z values, and then take the inverse Fisher z-transformation of the average z value.
I posted formulae for r2z (the Fisher z-transformation) and z2r (its inverse) on Slack a few days ago.
Sorry, let me be more specific.
Right now I am reviewing the process with which we average the correlation between recovered correlation matrix and the true correlation matrix. The matrix at hand is a one-dimensional matrix of length num_of_timepoints, where each element represents the correlation between the recovered correlation matrix and the true correlation matrix at the timepoint. I followed the formula you posted a few days ago where I apply Fisher Z-transformation, then average the z values, and then take the inverse Fisher z-transformation of the average z values.
The question I want to ask is: the (K+transpose(K))/(2*S) substitution for 2z that we discussed when applying ISFC is not relevant here right?
Generate figures for two synthetic datasets:
then use timecorr to recover the correlations. then generate four plots for each dataset and upload here in response to this issue:
1) same plots as you were making before-- correlations between the recovered correlation matrix and the actual correlation matrices. for the blocked version, show 10 lines (one per correlation matrix) and show that each block’s line fades in for that block, and then fades out when the block is done. for the ramping version, show 2 lines (one for the first timepoint’s correlation matrix and one for the last timepoint’s), and show that the two lines cross. purpose: show that we recover the temporal structure of the dataset.
2) alternative version of plot 1: instead of correlation, show the mean squared error between the recovered matrices and true matrices (the plot should have the same format — same number of lines — but the y-axis is now showing MSE instead of correlation). purpose: show that we are recovering the magnitudes of the correlations, not just the shapes of the correlation matrices.
3) just a single line-- for each timepoint, compute the correlation between the recovered correlation matrix at that timepoint and then true correlation matrix (for that timepoint only). the main difference from plot 1 will be for the ramping dataset. rather than showing crossing lines due to the recovered correlation matrices getting gradually more/less similar to the first and last correlation matrices, you should instead see a flat line showing the per-timepoint recovery.
4) MSE version of plot 3. (same plot, but show MSE rather than correlation on the y-axis)