Training code - Githubissues

nickhine commented 2 years ago

Hi

Really glad to see this is available: I've been keen to give it a go since the paper appeared on arxiv. I've had some success previously training useful models with PhysNet (as yet unpublished work) and this looks even better.

I cloned this repository and checked I could get the "example.py" code to run, and it did, so it seems I have a python environment that can cope with SpookyNet.

Perhaps I am missing something but I don't see, at the moment, any code here relating to training, eg definition of the loss function, loading the training data, or optimising the parameters with AMSGrad. In PhysNet you had a train.py which was straightforward to use.

Is that something you'll be adding at some point? Quite difficult to work out how to use this without it.

Thanks

Nick Hine University of Warwick

nickhine commented 2 years ago

@OUnke just wondering if you have any plans in regard to releasing training code?

shubbey commented 2 years ago

Hi Nick, I recently went through the same process to get this training on a custom network. If you look at example.py, the author shows how to encode your molecules to generate energies etc (this is mostly in spookynet_calculator.py, which you don't need to directly use but can learn from) . Training is basically the same thing.. you want to put your molecules in a batch via this encoding, generate the energies, and then compare those energies vs your known values to compute a loss function. Here's some rough code to give you an idea. It's pruned from a more complicated project of mine so it probably has a bunch of errors, and obviously you want to do things like store the current state so you can pause training etc, but that's the same for all of these projects. There are also a bunch of SpookyNet specific options if you want to incorporate charges or dipoles into your loss function-- the paper explains this. I'm pretty impressed with this model so far, at least compared to others I've experimented with such as ANI and PhysNet (which is the same person but this code is more modern). Hopefully it helps. Good luck.

class SpookyBatch:
    device = torch.device('cuda')

    def __init__(self):
        self.N = 0
        self.Z = []
        self.R = []
        self.E = []
        self.idx_i = []
        self.idx_j = []
        self.batch_seg = []

    def toTensor(self):
        self.Z = torch.tensor(self.Z,dtype=torch.int64,device=SpookyBatch.device)
        self.R = torch.tensor(self.R,dtype=torch.float32,device=SpookyBatch.device,requires_grad=True)
        self.Q = torch.zeros(self.N,dtype=torch.float32,device=SpookyBatch.device) # not using this so could just pass the same tensor around
        self.S = torch.zeros(self.N,dtype=torch.float32,device=SpookyBatch.device) # ditto
        self.E = torch.tensor(self.E,dtype=torch.float32,device=SpookyBatch.device)
        self.idx_i = torch.tensor(self.idx_i,dtype=torch.int64,device=SpookyBatch.device)
        self.idx_j = torch.tensor(self.idx_j,dtype=torch.int64,device=SpookyBatch.device)
        self.batch_seg = torch.tensor(self.batch_seg,dtype=torch.int64,device=SpookyBatch.device) # int64 required for "index tensors"
        return self

def load_batches(my_mols): # my_mols == some structure which has your loaded mol data, prob retrieved from a file,
                                              # or you can load it from a file here on demand to save memory
    batches = []
    batch = None
    nm = 0 # how many mols in current batch
    NM = 100 # how many mols we put in each batch
    for  m in my_mols: 
        if nm == 0:
            na = 0 # num total atoms in this batch
            batch = SpookyBatch() # stores the data in a format we can pass to SpookyNet

         batch.Z.extend(m.species)
         batch.R.extend(m.coords)
         batch.E.append(m.energy) # target energy
         cur_idx_i,cur_idx_j = get_idx(m.coords) # see below but also look at SpookyNetCalculator for more options
         cur_idx_i += na
         cur_idx_j += na
         batch.idx_i.extend(cur_idx_i)
         batch.idx_j.extend(cur_idx_j)
         batch.batch_seg.extend([nm]*len(m.species))
         na += len(m.species)
         nm += 1

         if nm >= NM:
             batch.N = nm
             batches.append(batch.toTensor()) # or you could convert to a tensor during training, depends on how much memory you have
             nm = 0 
    if batch:
        batches.append(batch.toTensor()
    return batches

# taken from SpookyNetCalculator 
def get_idx(R):
    N = len(R)
    idx = torch.arange(N,dtype=torch.int64)
    idx_i = idx.view(-1, 1).expand(-1, N).reshape(-1)
    idx_j = idx.view(1, -1).expand(N, -1).reshape(-1)
    # exclude self-interactions
    nidx_i = idx_i[idx_i != idx_j]
    nidx_j = idx_j[idx_i != idx_j]
    return nidx_i.numpy(),nidx_j.numpy() # kind of dumb converting to numpy when we use torch later, but it fits our model

def train():
    NUM_EPOCHES = 1000
    BEST_POINT = 'best.pt'

    model = SpookyNet().to(torch.float32).cuda()
    model.train()

    optimizer = torch.optim.Adam(model.parameters(),lr=START_LR,amsgrad=True)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.5, patience=25, threshold=0)

    training = load_batches(my_training_mols)
    validation = load_batches(my_validation_mols)
    mse_sum = torch.nn.MSELoss(reduction='sum')

    for epoch in range(NUM_EPOCHES):
        random.shuffle(training)

        for batch in training:
            N = batch.N
            res = model.energy(Z=batch.Z,Q=batch.Q,S=batch.S,R=batch.R,idx_i=batch.idx_i,idx_j=batch.idx_j,batch_seg=batch.batch_seg,num_batch=N)
            E = res[0]

            loss = mse_sum(E, batch.E)/N

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        rmse = compute_rmse(validation,model)
        if scheduler.is_better(rmse, scheduler.best):
            model.save(BEST_POINT)

        scheduler.step(rmse)
        print('Epoch: {} / LR: {} / RMSE: {:.3f} / Best: {:.3f}'.format(scheduler.last_epoch,learning_rate,rmse,scheduler.best))

def compute_rmse(batches,model):
    mse_sum = torch.nn.MSELoss(reduction='sum')
    total_mse = 0.0
    count = 0
    model.eval()
    for batch in batches:
        N = batch.N
        res = model.energy(Z=batch.Z,Q=batch.Q,S=batch.S,R=batch.R,idx_i=batch.idx_i,idx_j=batch.idx_j,batch_seg=batch.batch_seg,num_batch=N)
        E = res[0]

        # sum over pairings
        total_mse += mse_sum(E, batch.E).item()
        count += N

    model.train()
    return math.sqrt(total_mse / count)

DGKang234 commented 2 years ago

Hi shubbey, @shubbey

Thank you for sharing your customized training code! I really appreciate it. Your code is very straightforward.

I have a few questions. I know the code is an example code for training the data so, some features are missing but I just want to be confirmed a few very basic things: {my_mols}: pass the extended xyz file which contains n many molecules. If that is so, I should add a few lines in which column(s) are coordination, species, energy, etc.

If I want to train nanoclusters how should I add a flag to the program to know the input system the input file is a cluster, not the periodic system. For example, Lattice="0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0" is required before the "Properties" for the GAP ML-FF.

Thank you DGKang234

shubbey commented 2 years ago

Spookynet has you pass the entire system as one-dimensional tensors mapped to atoms (or three-dimensional for coords). So suppose you have two different water molecules Z1 = [1 1 8], R1 = [[x11,y11,z11],[x12,y12,z12],[x13,y13,z13]], Z2 = [1 1 8], R2 = [[x21,y21,z21],[x22,y22,z22],[x23,y23,z23]] and so forth for any other parameters. Then when you pass in the data, you'd merge these tensors as Z = [1 1 8 1 1 8] and R = [ [x11,y11,z11],[x12,y12,z12],[x13,y13,z13], [x21,y21,z21],[x22,y22,z22],[x23,y23,z23] ]. You also pass in batch_seg which says which atoms correspond to which molecule, so in this case it'd be batch_seg = [0 0 0 1 1 1 ].

As for the lattice stuff, I'm really not familiar but you can probably look at their example and follow the code to see how different things map. They provide some different options for the idx_i,idx_y input which is probably what you are interested in. Good luck!

zhengcheng233 commented 2 years ago

Hi shubbey, @shubbey Thank you for sharing your customized training code! Do you know that how this code can train the energy, force and dipole simultaneously (just like the eq28 in the author's paper). I guess that we can define a loss function including energy, force and dipole. However, I am not sure that during the training process, the electrostatic energy can be delete using the charge and vdw information automatically if I only just define the loss function with energy, force and dipole.

shubbey commented 2 years ago

Hi @zhengcheng233, Unfortunately I haven't played with training multiple parameters simultaneously so I can't really give you good advice here. In my experience with these things, you usually can just define a generalized loss function as long as you put them on the same scale. I guess you'll have to experiment to see what kind of results you get. Sorry I can't be of more help.

OUnke / SpookyNet

Training code #1