investigate and eliminate or better document alternate asymptotic covariance methods.

mrshirts commented 10 years ago

For example: method == 'tan-HGH' Doesn't seem to be working that well. Is it a bug, or not a very good method for estimating variance?

There probably should be ~2 methods -- the generally fastest one, and the generally most robust one.

kyleabeauchamp commented 10 years ago

Do we even need speed here? For example, the SVD algorithm involves an svd(), a pinv(), and a couple of small matrix multiplies. In my experience, these take under one second for many choices of N and K. IMHO there are two factors here: robustness and avoiding O(N^2) memory usage.

I propose that we modify the SVD based code to avoid O(N^2) memory usage and to use np.linalg.pinv(). I've done this in (#105), so we just need to explore whether this works and come up with some tests. The code (below) is a trivial update of what we were already using.

        elif method == 'svdk':
            # Use singular value decomposition based approach given in supplementary material to efficiently compute uncertainty
            # See Appendix D.1, Eq. D4 in [1].

            # Construct matrices
            Ndiag = np.matrix(np.diag(N_k), dtype=np.float64)
            W = np.matrix(W, dtype=np.float64)
            I = np.identity(K, dtype=np.float64)

            # Compute SVD of W
            [U, S, Vt] = linalg.svd(W, full_matrices=False)  # False Avoids O(N^2) memory allocation by only calculting the active subspace of U.
            Sigma = np.matrix(np.diag(S))
            V = np.matrix(Vt).T

            # Compute covariance
            Theta = V * Sigma * np.linalg.pinv(
                I - Sigma * V.T * Ndiag * V * Sigma) * Sigma * V.T

kyleabeauchamp commented 10 years ago

Also, I strongly advise having exactly one implementation, unless someone has a strong and well-documented (in code and literature) justification for several.

jchodera commented 10 years ago

I favor keeping multiple implementations, for several reasons:

The time complexities and memory complexities, as well as relevant prefactors, are generally different for different implementations, so there may not be a "one size fits all" even though we hope to have a general default. Remember that we sometimes try to use this for like K = 1000 states.
Most of the implementations in there have literature references, though some of these are in the statistical literature rather than stat mech literature. Others are small variations on these.
We should only keep implementations we can readily test for the simple systems. If we can't have a nosetest check all of the implementations, we should reconsider.
Some alternative implementations may be easier to implement in other codes or languages (e.g. gromacs may want a C++ g_mbar someday) and this can serve as a reference implementation.

kyleabeauchamp commented 10 years ago

I'm still strongly in favor of finding a single method that we can agree on. Reasons below.

K = 1000 is a non-issue with linear algebra on modern machines.
They literature references are partially undocumented
We currently don't even know which methods are expected to work. This isn't even about testing yet, this is about deprecating code that has possibly unknown failure cases. In my experience, the different methods sometimes give different answers.
Most of the implementations can be expressed as linear algebra, so IMHO there's not a huge difference in implementation complexity.

In the current code, it's even unclear what the recommended method is. If you follow the code, sometimes the docstrings say default "svd", while other times they say "svd-ev". In fact, this is a typo, because "svd-ev" isn't even one of the choices--the correct value would be "svd-ew".

jchodera commented 10 years ago

K = 1000 is a non-issue with linear algebra on modern machines.

Sure, but give K = 10000 a try.

kyleabeauchamp commented 10 years ago

My point is that we should, if possible, find a single implementation that works for all cases. I'm fine to support several implementations if that's impossible.

Perhaps we should be focusing our efforts on the svd-ew method rather than the svd method--this avoids storing the NxK matrix U of left singular values.

jchodera commented 10 years ago

They literature references are partially undocumented

We could certainly fix this.

We currently don't even know which methods are expected to work. This isn't even about testing yet, this is about deprecating code that has possibly unknown failure cases. In my experience, the different methods sometimes give different answers.

They definitely sometimes give different answers because they have different numerical behaviors. I don't know if it's even clear which ones are numerically most robust.

In the current code, it's even unclear what the recommended method is. If you follow the code, sometimes the docstrings say default "svd", while other times they say "svd-ev". In fact, this is a typo, because "svd-ev" isn't even one of the choices--the correct value would be "svd-ew".

I'm not sure which one is supposed to be recommended now. Originally, it was the svd-ew version I had come up with the the appendix of the MBAR paper, but I thought we had been using another default in most of our work since then.

I agree this isn't an ideal scenario and that we should move toward simpler and more maintainable code. I'd love it if we could make use of one method for everything---and perhaps we can? It may just be a step backwards to get rid of everything but one method if we find we need more than that.

jchodera commented 10 years ago

My point is that we should, if possible, find a single implementation that works for all cases.

Actually, if we can find one that works for all cases, I guess I'm all for that.

kyleabeauchamp commented 10 years ago

OK, so the next question is for you and Michael to recall some of the "difficult" examples, so we can convert them into regression tests.

jchodera commented 10 years ago

Do we have any 3D umbrella sampling synthetic datasets in there? I think that was one of the more difficult large-K cases.

kyleabeauchamp commented 10 years ago

I see a 1D US example in pymbar-examples, but no 3D

jchodera commented 10 years ago

Remind me to transfer a copy of the 'large datasets from others that we don't have permission to share' when I get back to NYC.

I think we can design a synthetic dataset where we generate umbrella sampling data from a large origin-centered harmonic potential U0(x) = (K0/2) (x^2 + y^2 + z^2) with local umbrella biasing potentials on a 3D grid Uijk(x) = (Kbias/2) [(x-i)^2 + (y-j)^2 + (z-k)^2], with ijk indices running from -5 to +5. This will give us a K=1331 state system, with the overlap between states adjustable by choice of Kbias.

I believe the total potential U(x) = U0(x) + Uijk(x) is also harmonic, and it should be possible to generate uncorrelated samples from each state using normal random variates.

kyleabeauchamp commented 10 years ago

OK, I also just added a svd-ew-kab method to my pull request, which is just the svd-ew style calculation using np.linalg.pinv instead of the custom pseudoinverse. From the docstrings, it seems like the svd-ew approach is the current recommended approach. It also has a low memory profile, avoiding the O(KN) matrix of left singular vectors. If we can make this approach work generally, I think that would be ideal.

jchodera commented 10 years ago

Did we ever implement the method in Appendix D.2 of the MBAR paper (specifically Eq. D8)? I liked this form, since it only involves manipulation and pseudoinversion of KxK matrices. I believe it requires that all K states be unique, however...

mrshirts commented 10 years ago

Hi, all-

Just to chime in -- first, Kyle, thanks for taking a look at this, since this was always one of the more annoying parts of the code.

Do we even need speed here?

Yes, speed has been an issue in the past if we are doing large scans. Though I think a chunk of the speed problem was actually doing K^2 operations on Theta post matrix decomposition, which has mostly been cleaned up.

In the current code, it's even unclear what the recommended method is.

Yeah, that's what we definitely need to figure out and also include in a "numerical details of MBAR paper".

I don't think there will necessarily be a single method that works for all purposes - in some case, we might prioritize memory over speed, in others, speed over memory. One could imagine that one might actually want a superfast somewhat less accurate estimate -- though all of the "less accurate" methods that we have in there are too approximate. However, all methods should have a clear purpose. If there's not a reason to include one, we should toss it.

Certainly if there is a single method that is better in all aspects, then we should use it, but we should have to show that first.

the svd-ew version

This is certainly the default in the code now.

Speed is definitely an issue.

K = 1000 is a non-issue with linear algebra on modern machines.

We're definitely like to be able to handle larger matrices if we can. Ideally, if we can handle the number of states for the free energies, we'd like to compute the covariance matrix.

They literature references are partially undocumented

Then this would be something to make sure that we have covered in a paper.

We currently don't even know which methods are expected to work. This isn't even about testing yet, this is about deprecating code that has possibly unknown failure cases.

The are in the code because they are expected to work. All new methods have possible unknown failure cases -- that's why we need to complete the previously non-rigorous testing that didn't get finished because it was low priority.

In my experience, the different methods sometimes give different answers.

Yes, there are a few methods that are supposed to be fast and approximate. However, all of the approximate methods that are currently in there have ended up failing in the poor sampling regime (which is exactly where we need good estimators), or aren't numerically robust. I can try to dig up the emails discussing this, but basically it involved running the confidence_interval code.

Most of the implementations can be expressed as linear algebra, so IMHO there's not a huge difference in implementation complexity.

But there are always a bunch of different ways to do the same linear algebra with different algorithms with different memory and computational requirements, which is basically what we were exploring (but left in a state that leaves much to be desired and now you're cleaning up!).

I think we can design a synthetic dataset where we generate umbrella sampling data from a large origin centered harmonic potential U0(x) = (K0/2) (x^2 + y^2 + z^2) with local umbrella biasing potentials on a 3D grid Uijk(x) = (Kbias/2) [(x-i)^2 + (y-j)^2 + (z-k)^2], with ijk indices running from -5 to +5. This will give us a K=1331 state system, with the overlap between states adjustable by choice of Kbias.

I believe the total potential U(x) = U0(x) + Uijk(x) is also harmonic, and it should be possible to generate uncorrelated samples from each state using normal random variates.

I thought I had code for this somewhere -- let me try to dig it out. This test should work decently well for testing purposes. I think we'd like to keep pumping up states until this fails, so we know the large K limits for both speed and memory usage.

On Sun, Aug 17, 2014 at 4:54 PM, kyleabeauchamp notifications@github.com wrote:

OK, I also just added a svd-ew-kab method to my pull request, which is just the svd-ew style calculation using np.linalg.pinv instead of the custom pseudoinverse. From the docstrings, it seems like the svd-ew approach is the current recommended approach. It also has a low memory profile, avoiding the O(KN) matrix of left singular vectors. If we can make this approach work generally, I think that would be ideal.

— Reply to this email directly or view it on GitHub https://github.com/choderalab/pymbar/issues/87#issuecomment-52434676.

mrshirts commented 10 years ago

Did we ever implement the method in Appendix D.2 of the MBAR paper (specifically Eq. D8)? I liked this form, since it only involves manipulation and pseudoinversion of KxK matrices. I believe it requires that all K states be unique, however...

It's supposedly in as method "inverse". I haven't tested performance recently, though.

kyleabeauchamp commented 10 years ago

Here's a speed benchmark for the method svd-ew-kab:

                   name     K       N    time

0    25x100 exponentials    25    2500 0.00436
1   100x100 exponentials   100   10000  0.0322
2   250x250 exponentials   250   62500   0.485
3     25x100 oscillators    25    2500 0.00337
4    100x100 oscillators   100   10000  0.0288
5    250x250 oscillators   250   62500   0.579
6    500x100 oscillators   500   50000    1.31
7    1000x50 oscillators  1000   50000    3.48
8    2000x20 oscillators  2000   40000    12.2
9    4000x10 oscillators  4000   40000    80.1
10  500x100 exponentials   500   50000   0.967
11  1000x50 exponentials  1000   50000    3.03
12  2000x20 exponentials  2000   40000    13.5
13   4000x10 oscillators  4000   40000    71.8
14        gas-properties   155  240048     1.4
15             8proteins    28  112000   0.107

kyleabeauchamp commented 10 years ago

For the record, here are some representative timings for the various linear algebra operations on a 5000x5000 matrix. As expected, eigenvalues are cheap, SVD / pinv are expensive.

In [3]: %time l,v = np.linalg.eigh(x)
CPU times: user 1min 26s, sys: 1.02 s, total: 1min 27s
Wall time: 23.4 s

In [5]: %time xinv = np.linalg.pinv(x)
CPU times: user 4min 46s, sys: 2.5 s, total: 4min 48s
Wall time: 1min 15s

In [6]: %time u,d,v = np.linalg.svd(x)
CPU times: user 4min 27s, sys: 2.32 s, total: 4min 30s
Wall time: 1min 9s

In [7]: %time xinv = np.linalg.inv(x)
CPU times: user 27.3 s, sys: 373 ms, total: 27.7 s
Wall time: 10.5 s

kyleabeauchamp commented 10 years ago

Here are the covariance timings with the inverse method. Note that is with the fix to #106, as otherwise these systems would blow up from memory usage.

          name     K       N    time
0    25x100 exponentials    25    2500  0.0175
1   100x100 exponentials   100   10000  0.0347
2   250x250 exponentials   250   62500   0.798
3     25x100 oscillators    25    2500 0.00301
4    100x100 oscillators   100   10000  0.0358
5    250x250 oscillators   250   62500    0.76
6    500x100 oscillators   500   50000    1.71
7    1000x50 oscillators  1000   50000    5.46
8    2000x20 oscillators  2000   40000    16.3
9    4000x10 oscillators  4000   40000    55.5
10  500x100 exponentials   500   50000    1.68
11  1000x50 exponentials  1000   50000    5.63
12  2000x20 exponentials  2000   40000    15.1
13   4000x10 oscillators  4000   40000    45.7
14        gas-properties   155  240048     1.9
15             8proteins    28  112000   0.143

kyleabeauchamp commented 10 years ago

Conclusion: the more general svd-ew-kab method is only 2X slower than the method that relies on the numerical inverse (and unique states), so there's not that much lost by defaulting to the fully general method.

kyleabeauchamp commented 10 years ago

Also, with the inverse method, these systems all give warnings about singularity, despite having unique states.

Warning: W'W appears to be singular, yet 'inverse' method of uncertainty estimation requires W contain no duplicate states.

jchodera commented 10 years ago

Things are looking good for the svd-ew-kab method! I guess you may be right that we can get by with one approach.

If we ditch the other methods, we could always go back and pull them back in if we find new tests start failing. But I like the "one method" more and more.

How much of a departure is this scheme from Appendix D.1? On Aug 18, 2014 10:41 AM, "kyleabeauchamp" notifications@github.com wrote:

Also, with the inverse method, these systems all give warnings about singularity, despite having unique states.

Warning: W'W appears to be singular, yet 'inverse' method of uncertainty estimation requires W contain no duplicate states.

— Reply to this email directly or view it on GitHub https://github.com/choderalab/pymbar/issues/87#issuecomment-52527460.

kyleabeauchamp commented 10 years ago

It's actually discussed in that appendix, equation D5.

mrshirts commented 10 years ago

Thanks for making these measurements!

svd-ew-kwd is just the old svd-ew with the numpy linalg pinv instead of the homebrew pseudoinverse, correct?

How does 'inverse' scale in terms of memory usage? That's the one place where it seems like it might have an advantage if N is large.

How similar are the results of inverse and svn-ew-kwb? Are we giving an inaccurately sensitive test of singularity, or are the results actually affected significantly by numerical error?

kyleabeauchamp commented 10 years ago

Yes, kab is just the svd-ew + np.linalg.pinv.

Inverse and svd-ew should have the same memory footprint--both operate entirely on KxK matrices.

I can't comment on the accuracy / singularity yet, haven't looked into it.

choderalab / pymbar

investigate and eliminate or better document alternate asymptotic covariance methods. #87