exalearn / hydronet

HydroNet: Benchmark Tasks for Preserving Long-range Interactions and Structural Motifs in Predictive and Generative Models for Molecular Data, at the 34th Conference on Neural Information Processing Systems (NuerIPS), Workshop on Machine Learning and the Physical Sciences [https://arxiv.org/abs/2012.00131]
Apache License 2.0
7 stars 3 forks source link

Create a policy where alternate between purely-random and guided moves #9

Open WardLT opened 2 years ago

WardLT commented 2 years ago

I'm not sure how to do this, but there has to be some way of forcing the network to be "curious" in training.

Something to talk about in our next meeting, perhaps!

WardLT commented 2 years ago

@sutanay has some ideas on this that he's already talked to Jenna about

sutanay commented 2 years ago

@WardLT - this would be updating the MPNN to systematically add training samples - using the "curiosity" objective. Here was the idea. ` def train_energy_prediction_model(): """ This module trains an energy prediction model in a simulation driven approach combined with self-supervised learning. It samples the chemical space and adaptively generates molecular graphs that represents points from the chemical space where current model performance is suboptimal. Each graph is transformed into 3d representation where every node is associated with a (x, y, z) coordinate. This 3D representation of the graph is processed by TTM to compute the potential energy, and a graph neural network is trained from the collection of (graph, energy) pairs. """ min_size, max_size = [3, 30] for n in range(min_size, max_size+1):

n_graphs needs to change with n

    # Bigger values of n will lead to larger diversity in graph structures
    pause_training = False
    max_iters = 1000
    n_iters = 0
    while pause_training == False and num_iters < max_iters:
        # If this is the first time, then return sample structures from
        # current database, else generate the dataset by samplig the chemical space.
        # get_db_graphs() or gen_candidates() returns a set of (graph, energy) pairs.
        # Use techniques from the "Curiosity in exploring chemical space: intrinsic
        # rewards for deep molecular reinforcement learning" paper to implement
        # gen_candidates()
        graphs = get_db_graphs(n) if n_iters == 0 else gen_candidates(model, n)

        train_graphs, valid_graphs, test_graphs = split_dataset(graphs)
        train_and_eval_model(model, train_graphs, valid_graphs, test_graphs)

        # Determine when the model has stablized
        pause_training = eval_model(model, n, test_graphs)
        n_iters += 1

`