alan-turing-institute / ThermodynamicAnalyticsToolkit

Sampling-based approach to analyse neural networks using TensorFlow
https://alan-turing-institute.github.io/ThermodynamicAnalyticsToolkit/
GNU General Public License v3.0
21 stars 2 forks source link

Comments on testing examples from documentation: Simulation #46

Closed ZofiaTr closed 5 years ago

ZofiaTr commented 5 years ago

Describe precisely the difference between returned frames by fit():

run_info, trajectory, averages = nn.fit()

Maybe making averages and trajectory optional?

ZofiaTr commented 5 years ago

Small remark: I am rather using display(run_info) than print(np.asarray(run_info[-10::])) It helps to see the keys of the frame.

ZofiaTr commented 5 years ago

I like 2.2.3.1 Provide your own dataset.

ZofiaTr commented 5 years ago

This part is bit misleading:

Nonetheless, optimization is always the initial step to sampling as we need our random starting configuration to touch base with the loss manifold. If we start at a hill, then the sampler will accumulate a lot of momentum when going downhill which will carry him on for a quite a while. In contrast, we want the sampler to move around by thermal noise. Therefore, we want to start at a point where gradients are small, best close to zero, and simply use random initial momenta.

In general, when sampling from a distribution (to compute empirical averages for example), one wants to start 'close to equilibrium', i.e. from states which are of high probability with respect to the target distribution (therefore the minima of the loss). The initial optimisation procedure is therefore a first guess to find such states, or at least to get close to them. In molecular dynamics, it is common to run sampling during an "equilibration period" in order to let the system relax to its equilibrium. During this equilibration time, the generated samples are not used for computing averages, as they will introduce large statistical error.

ZofiaTr commented 5 years ago

Using a prior: we should formalize the algorithm used for priors and put formulas.

ZofiaTr commented 5 years ago

Figure 2.4: Sampled weights: Plot of first against second weight:

Since we are talking about sampling, could we provide plots of the distributions as well? Can we compare the two with or without prior?

ZofiaTr commented 5 years ago

Wrong index in 2.2.5.1 Averages?

I think the plot does also step over step, since it is the second column in df_trajectory.

ZofiaTr commented 5 years ago

We need to rewrite:

Averages are interesting when the goal is to see whether a sampler indeed maintains the desired temperature. For inspecting the loss manifold, we need different analysis methods. Especially, we need ways of describing the minima basins and their shape. One such possibility are so-called collective variables, namely low-dimensional functions that describe the principal shape of the manifold of a minima. Another import property would be directions to get from one minima to another. This also can be encoded in a collective variable.

ZofiaTr commented 5 years ago

2.2.5.2 Diffusion Map: you are only looking at weight0 which is missing the point. We need to review the commentary in the documentation.

Please replace the code for visualisation by:

import numpy as np
import pandas as pd
import matplotlib

# use agg as backend to allow command-line use as well

matplotlib.use("agg")
import matplotlib.pyplot as plt
from TATi.TrajectoryAnalyser import compute_diffusion_maps

# option values coming from the sampling

inverse_temperature=1e4
df_trajectory = pd.read_csv("trajectory.csv", sep=',',
    header=0)

traj=np.asarray(df_trajectory)
steps = df_trajectory.loc[0::, ['step']].values
loss = df_trajectory.loc[0::, ['loss']].values

# get index to first parameter column "weight0, weight1"

trajectory = df_trajectory[['weight0', 'weight1']].values
num_eigenvalues=2
vectors, values, q = compute_diffusion_maps(
    traj=trajectory,
    beta=inverse_temperature,
    loss=loss,
    nrOfFirstEigenVectors=num_eigenvalues,
    method="vanilla",
    use_reweighting=False)

plt.scatter(vectors[:,0], vectors[:,1], c=vectors[:,0], edgecolor='k')
plt.xlabel('1st Diffusion Coordinate')
plt.ylabel('2nd Diffusion Coordinate')
plt.savefig('eigenvectors.png', bbox_inches='tight')

plt.show()

plt.scatter(trajectory[:,0], vectors[:,0], c=vectors[:,1], edgecolor='k')
plt.xlabel('weight0')
plt.ylabel('1st Diffusion Coordinate')
plt.legend(['Color by 2nd DC'])
plt.show()

plt.scatter(trajectory[:,1], vectors[:,0], c=vectors[:,1], edgecolor='k')
plt.xlabel('weight1')
plt.ylabel('1st Diffusion Coordinate')
plt.legend(['Color by 2nd DC'])
plt.show()
ZofiaTr commented 5 years ago

Images for diffusion maps: third second first

It would be good to show this kind of visualisation for a longer trajectory, I only did 1000 steps..

FrederikHeber commented 5 years ago

Thanks for the detailed feedback!

As a small note: I have reformatted the large code block (you can use triple back-ticks in github's markdown for bracketting large code blocks, see Syntax highlighting.

FrederikHeber commented 5 years ago

You mentioned some smaller stuff which I will fix right away.

Moreover, you mention some more involved issues:

  1. return values of fit() and sample(): I am unhappy with this myself but so far have not had a good idea what is the correct thing to return on fit. Is it simply the trajectory? One could also return a structure/class that contains all three. What would you say to this?

    vaues = nn.fit()
    values.trajectories
    values.run_info
    values.accuracy
  2. misleading part on optimization before sampling: This depends very much on the audience of the guide. I imagined if folks do not have molecular dynamics background, then they don't know about the necessity to first go to near-equilibrium. The way I described it I tried setting it as a purely technical aspect (which is also true): Far from equilibrium gradients will be large and dominant over control through temperature-induced noise. Hence, it will take a much longer time before admissible samples can be taken. I could turn your explanations into an extra note hinting at the theoretical aspects.

  3. Priors need some formalization in general (also in the code). So far, they are sort of make-shift. I will add formulas to make it clearer. However, this is probably an issue larger than the simulation interface.

  4. "We need to rewrite": What exactly don't you like about the introduction of collective variables concept. Or is just too sloppy? (e.g., low-dimensional function should rather be low-dimensional embedding/mapping, "another import property" comes too abrupt and may need its own paragraph)

  5. "2.2.5.2 Diffusion Map": No, I am taking "weight0" as the first index and then addressing [index:], i.e. from index 0 onwards all till the end. I think you have missed the extra double-colon.

  6. "please replace the code ...": Could you provide some text to put in the documentation explaining the figures? I can make figures with longer max_steps, no problem.

ZofiaTr commented 5 years ago

Replying the above:

  1. Indeed a very good idea!

  2. I don't agree that it depends on the audience, because your point of view is purely deterministic and I think it contains some false statements. I agree that we don't need to include the molecular dynamics point of view, I meant it as an example in this discussion, so that we can agree on the right motivation. An example:

    we want the sampler to move around by thermal noise

We don't have pure Brownian motion but Langevin dynamics, which also contains the gradients of the loss function.

  1. Great! I would either hide this functionality or add more details of the algorithm you are using into the documentation.

  2. I am saying "we" because I am interested in your opinion, but I can do it later. Putting it on my to do list. Let me give you some examples:

    Averages are interesting when the goal is to see whether a sampler indeed maintains the desired temperature.

In general, averages are the main goal of sampling.

, we need ways of describing the minima basins and their shape. One such possibility are so-called collective variables, ..

I wouldn't say that: collective variables is a term used for simple quantities used to describe phenomena at some time scales. In chemistry also called reaction coordinates: wiki. Diffusion maps are used to parametrize the underlying manifold..

  1. Ok, sorry.

  2. I will!

FrederikHeber commented 5 years ago

These are more or less incorporated into the userguide. Hence, I am closing this ticket.

ZofiaTr commented 5 years ago

modified diffusion map part in the userguide as discussed above (pushed to Userguide_samplers branch)

FrederikHeber commented 5 years ago

I had assumed that item 5. was simply a misunderstanding on your part. However, I have noticed that the code you supplied on 2.2.5.2. was still present in the Userguide_Samplers branch.

I have removed this commit as it did not really improve the userguide. Only one of the three created figures is actually shown in the guide and none of them are explained. Instead you simply removed the text I wrote on the two basins. My old text actually connected to the introduction of the guide which the new version does not at all.