Open WardLT opened 2 years ago
@sutanay has some ideas on this that he's already talked to Jenna about
@WardLT - this would be updating the MPNN to systematically add training samples - using the "curiosity" objective. Here was the idea. ` def train_energy_prediction_model(): """ This module trains an energy prediction model in a simulation driven approach combined with self-supervised learning. It samples the chemical space and adaptively generates molecular graphs that represents points from the chemical space where current model performance is suboptimal. Each graph is transformed into 3d representation where every node is associated with a (x, y, z) coordinate. This 3D representation of the graph is processed by TTM to compute the potential energy, and a graph neural network is trained from the collection of (graph, energy) pairs. """ min_size, max_size = [3, 30] for n in range(min_size, max_size+1):
# Bigger values of n will lead to larger diversity in graph structures
pause_training = False
max_iters = 1000
n_iters = 0
while pause_training == False and num_iters < max_iters:
# If this is the first time, then return sample structures from
# current database, else generate the dataset by samplig the chemical space.
# get_db_graphs() or gen_candidates() returns a set of (graph, energy) pairs.
# Use techniques from the "Curiosity in exploring chemical space: intrinsic
# rewards for deep molecular reinforcement learning" paper to implement
# gen_candidates()
graphs = get_db_graphs(n) if n_iters == 0 else gen_candidates(model, n)
train_graphs, valid_graphs, test_graphs = split_dataset(graphs)
train_and_eval_model(model, train_graphs, valid_graphs, test_graphs)
# Determine when the model has stablized
pause_training = eval_model(model, n, test_graphs)
n_iters += 1
`
I'm not sure how to do this, but there has to be some way of forcing the network to be "curious" in training.
Something to talk about in our next meeting, perhaps!