Wrapping pynnp into an ASE calculator

mikejwaters commented 2 years ago

I'd like to implement an ASE calculator that uses pynnp for energy/force calculations. Here are a few reasons I want to do this:

I use ASE databases to manage my large set of ML structures
ASE has many structural methods like optimization/MD/phonons/NEB/dimer built-in
ASE can read many common file formats to make evaluated new structures easier

The first step is creating a pynnp.Structure from an ase.Atoms object without performing file IO. So far I have this function:

def make_structure_from_ase_atoms(ase_atoms, elementMap):

    nnp_structure = pynnp.Structure()
    nnp_structure.setElementMap(elementMap)

    for ase_atom in ase_atoms:
        nnp_atom = pynnp.Atom()
        nnp_structure.addAtom(nnp_atom, ase_atom.symbol)
    nnp_structure.isPeriodic = ase_atoms.pbc[0] # should check all of the directions eventually
    nnp_structure.isTriclinic = True

    for i in range(3):
        for j in range(3):
            nnp_structure.box[i][j] = ase_atoms.cell[i][j]

    return nnp_structure

and here is where I am trying to evaluate it using the pynnp example 2:

print_output = True
directory = './'
structure_file = directory + 'input.data'
input_file= directory+'input.nn'
weight_file_format =directory+'weights.%03zu.data'
scaling_file = directory+'scaling.data'

#######
import pynnp

# Initialize NNP setup (symmetry functions only).
m = pynnp.Mode()
m.initialize()
m.loadSettingsFile(input_file)
m.setupElementMap()
m.setupElements()
m.setupCutoff()
m.setupSymmetryFunctions()
m.setupSymmetryFunctionGroups()
m.setupNeuralNetwork()
m.setupSymmetryFunctionScaling(scaling_file)
m.setupNeuralNetworkWeights(weight_file_format)
m.setupSymmetryFunctionStatistics(True, True, True, False)

# Retrieve cutoff radius form NNP setup.
cutoffRadius = m.getMaxCutoffRadius()
print("Cutoff radius = ", cutoffRadius)

# create structure from atoms object
s = make_structure_from_ase_atoms(ase_atoms, m.elementMap)

# Calculate neighbor list.
s.calculateNeighborList(cutoffRadius)

# Calculate symmetry functions for all atoms (use groups).
#m.calculateSymmetryFunctions(s, True)
m.calculateSymmetryFunctionGroups(s, True)

# Calculate atomic neural networks.
m.calculateAtomicNeuralNetworks(s, True)

# Sum up potential energy.
m.calculateEnergy(s)

# Collect force contributions.
m.calculateForces(s)

if print_output:
    print("------------")
    print("Structure 1:")
    print("------------")
    print("numAtoms           : ", s.numAtoms)
    print("numAtomsPerElement : ", s.numAtomsPerElement)
    print("------------")
    print("Energy (Ref) : ", s.energyRef)
    print("Energy (NNP) : ", s.energy)
    print("------------")
    for atom in s.atoms:
        print(atom.index, atom.element, m.elementMap[atom.element], atom.f.r)

This is the error I get:

Traceback (most recent call last):
  File "single_point_comparison_pynnp.py", line 188, in <module>
    m.calculateSymmetryFunctionGroups(s, True)
  File "Mode.pyx", line 69, in pynnp.Mode.calculateSymmetryFunctionGroups
    self.thisptr.calculateSymmetryFunctionGroups(deref(structure.thisptr),
ArithmeticError: ERROR: Number of symmetry function derivatives unset, cannot allocate.

How do I set the number symmetry function derivatives?

Thanks, -Mike

singraber commented 2 years ago

Hello Mike!

Really a great idea to couple n2p2 to ASE.. thanks for having a look at it!

I may have an answer but because I cannot test it, I may be wrong... however, here's my guess of what is wrong:

The error indicates that Atom::numSymmetryFunctionDerivatives is empty, it is passed internally here:

https://github.com/CompPhysVienna/n2p2/blob/72f58c07930ec3da0819c815575a74f12994f318/src/libnnp/Mode.cpp#L1249-L1253

It should not be required to set this manually because the sizes are prepared in Mode::setupSymmetryFunctionMemory... however, I don't see this routine called in your python setup. Hence, my suggestion is to add these two lines, just like in the pynnp example here:

https://github.com/CompPhysVienna/n2p2/blob/72f58c07930ec3da0819c815575a74f12994f318/examples/pynnp/prediction/example-verbose.py#L20-L21

Can you give it a try and report back! Thank you!

Best,

Andreas

mikejwaters commented 2 years ago

Hi Andreas,

Thanks for the reply! When I try either of those functions, I get an attribute error:

Traceback (most recent call last):
  File "single_point_comparison_pynnp.py", line 160, in <module>
    m.setupSymmetryFunctionCache()
AttributeError: 'pynnp.Mode' object has no attribute 'setupSymmetryFunctionCache'

I'm on version 2.1.4 so I'll try the master branch and get back to you.

vsumaria commented 2 years ago

@mikejwaters, I tried this and got the same error: "'pynnp.Mode' object has no attribute 'setupSymmetryFunctionCache'"

I ended up writing something that might be useful. Right now the code is not very clean and needs improvement (some things are a little brute forces which can be improved with help.

import sys
import numpy as np
from ase.calculators.calculator import all_changes
from ase.calculators.calculator import Calculator
import pynnp
import os

class N2P2Calculator(Calculator):

    def __init__(self, **kwargs):
        Calculator.__init__(self, **kwargs)
        self.implemented_properties = {
            'energy' : self.calculate_energy_and_forces,
            'forces' : self.calculate_energy_and_forces}
        self.results = {}

    def calculate(self, atoms=None, properties=['energy'], system_changes=all_changes):
        # Calculator essentially does: self.atoms = atoms
        Calculator.calculate(self, atoms, properties, system_changes)

        has_results = [p in self.results for p in properties]
        if (len(system_changes) > 0) or (not np.all(has_results)):
            if ('energy' in properties) and ('forces' in properties):
                # Forces evaluation requires energy. No need to compute
                # energy twice.
                del properties[properties.index('energy')]
            for p in properties:
                if p in self.implemented_properties:
                    self.implemented_properties[p](self.atoms)
                else:
                    raise NotImplementedError(
                        "Property not implemented: {}".format(p))

    def calculate_energy_and_forces(self, atoms):

        def n2p2_data():
            out_str = ''
            out_str += 'begin\n'
            out_str += 'comment test \n'

            n = len(atoms)
            cell = atoms.get_cell()
            cell_template = 'lattice {:10.6f} {:10.6f} {:10.6f}\n'
            for c in cell:
                out_str += cell_template.format(c[0], c[1], c[2])

                atom_template = 'atom {:10.6f} {:10.6f} {:10.6f} {:2s} {:10.6f} {:10.6f} {:10.6f} {:10.6f} {:10.6f}\n'
                forces = np.zeros([n,3])
            for a in atoms:
                force = forces[a.index]
                out_str += atom_template.format(a.position[0], a.position[1], a.position[2], a.symbol, 0.0, 0.0, force[0], force[1], force[2])

            energy = 0
            out_str += 'energy {:10.6f}\n'.format(energy)
            out_str += 'charge 0.0\n'
            out_str += 'end\n'

            return out_str

        with open('input.data', 'w') as input_data:
            input_data.write(n2p2_data())

        os.system('cp ~/nnp/* .')
        p = pynnp.Prediction()
        p.setup()
        p.readStructureFromFile()
        p.predict()
        s = p.structure

        e_nnp = s.energy
        natoms = s.numAtoms

        f = np.zeros([natoms,3])
        for i,atom in enumerate(s.atoms):
            f[i,:] = atom.f.r

        self.results['energy'] = e_nnp
        self.results['forces'] = f

        os.system('rm weights* input.nn scaling.data input.data')

Basically I used pynnp and created an ASE calculator which is working. But I had to follow the method as discussed in the example-simple.py, and it would be better I can use the verbose method. (I will not need to copy the input.nn, weight* and scalaing files)

mikejwaters commented 2 years ago

Whoa @vsumaria this is a nearly complete ASE fileIO calculator! In other words, how ASE interacts with any program that doesn't have python interface, e.g. VASP. Would you be interested in contributing it to the ASE project?

Off the top of my head, I know that they will want to make the input.data read and write functions separate. I can help too.

mikejwaters commented 2 years ago

@singraber I think its working now. The problem is that these attributes are not in v2.1.4 of n2p2 and are only on the master branch:

 m.setupSymmetryFunctionMemory() # comment out when compiled with N2P2_FULL_SFD_MEMORY 
 m.setupSymmetryFunctionCache() # comment out when compiled with N2P2_NO_SF_CACHE

vsumaria commented 2 years ago

@mikejwaters - yes, we should be able to create a ASE calculator as a pynnp wrapper (which is what I did). Also, I will try to fix that above mentioned problem, possibly just requires the correct version of N2P2.

One thing I am not 100% sure about is, if I were to use this method, can we also parallize calculations with such an ASE calculator. For VASP, we define a separate VASP_COMMAND variable which runs in parallel using the vasp binaries. I am not sure how to implement this here.

mikejwaters commented 2 years ago

@vsumaria @singraber can parallelism be use when doing model evaluations from pynnp? if not, I suspect the command variable in a pure fileIO calculator would have to be something like "mpirun -np 4 nnp-predict" We might end up with two calculators: PynnpCalculator (pure python) and N2p2Calculator (fileIO type). It's not a big deal to have two, LAMMPS has two calculators in ASE for the same reason.

@singraber I started modularizing functions so I can make a pure python calculator. I keep getting NaNs if I have two atoms in a box. Here is my script:

#!/usr/bin/env python3
import pynnp

###################### functions

def make_structure_from_ase_atoms(ase_atoms, elementMap):

    nnp_structure = pynnp.Structure()
    nnp_structure.setElementMap(elementMap)

    for ase_atom in ase_atoms:
        nnp_atom = pynnp.Atom()
        nnp_structure.addAtom(nnp_atom, ase_atom.symbol)

    nnp_structure.isPeriodic = ase_atoms.pbc[0] # I should check all of the directions eventually
    nnp_structure.isTriclinic = True

    for i in range(3):
        for j in range(3):
            nnp_structure.box[i][j] = ase_atoms.cell[i][j]

    return nnp_structure

def setup_nnp_mode(elements, input_file, scaling_file, weight_file_format):

    nnp_element_map = pynnp.ElementMap()
    nnp_structure = pynnp.Structure()

    element_line = ''
    for el in elements:
        element_line += el+' '

    nnp_element_map.registerElements(element_line)
    nnp_structure.setElementMap(nnp_element_map)

    # Initialize NNP setup (symmetry functions only).
    nnp_mode = pynnp.Mode()
    nnp_mode.initialize()
    nnp_mode.loadSettingsFile(input_file)
    nnp_mode.setupElementMap()
    nnp_mode.setupElements()
    nnp_mode.setupCutoff()

    #print('nnp_mode.setupSymmetryFunctions()')
    nnp_mode.setupSymmetryFunctions()
    #print('nnp_mode.setupSymmetryFunctionMemory()')
    nnp_mode.setupSymmetryFunctionMemory()
    #print('nnp_mode.setupSymmetryFunctionCache()')
    nnp_mode.setupSymmetryFunctionCache()
    #print('nnp_mode.setupSymmetryFunctionGroups()')
    nnp_mode.setupSymmetryFunctionGroups()

    nnp_mode.setupNeuralNetwork()
    nnp_mode.setupSymmetryFunctionScaling(scaling_file)
    nnp_mode.setupNeuralNetworkWeights(weight_file_format)
    nnp_mode.setupSymmetryFunctionStatistics(True, True, True, False)

    return nnp_mode

def calculate_on_nnp_structure(nnp_str, nnp_mode):

    # Retrieve cutoff radius form NNP setup.
    cutoffRadius = nnp_mode.getMaxCutoffRadius()
    print("Cutoff radius = ", cutoffRadius)

    # Calculate neighbor list.
    nnp_str.calculateNeighborList(cutoffRadius)

    # Calculate symmetry functions for all atoms (use groups).
    #nnp_mode.calculateSymmetryFunctions(nnp_str, True)
    nnp_mode.calculateSymmetryFunctionGroups(nnp_str, True)

    # Calculate atomic neural networks.
    nnp_mode.calculateAtomicNeuralNetworks(nnp_str, True)

    # Sum up potential energy.
    nnp_mode.calculateEnergy(nnp_str)

    # Collect force contributions.
    nnp_mode.calculateForces(nnp_str)

    print("------------")
    print("Structure:")
    print("------------")
    print("Cell:")
    print('a]',nnp_str.box[0])
    print('b]',nnp_str.box[1])
    print('c]',nnp_str.box[2])
    print("numAtoms           : ", nnp_str.numAtoms)
    print("numAtomsPerElement : ", nnp_str.numAtomsPerElement)
    print("------------")
    print("Energy (Ref) : ", nnp_str.energyRef)
    print("Energy (NNP) : ", nnp_str.energy)
    print("------------")
    for atom in nnp_str.atoms:
        print(atom.index, atom.element, nnp_mode.elementMap[atom.element], atom.f.r)

#############################
from ase import io
from ase.build import bulk
import os
elements = ['Nb', 'Ti', 'Zr', 'O']
#directory = os.path.abspath('../model_000/')+'/'
directory = ''
input_file         = directory+'input.nn'
weight_file_format = directory+'weights.%03zu.data'
scaling_file       = directory+'scaling.data'

# create the mode
nnp_mode = setup_nnp_mode(elements, input_file, scaling_file, weight_file_format)

# a single atom primitive cell
ase_atoms0 = bulk('Ti', 'bcc', a= 3.23678)

# one atom
ase_atoms  = ase_atoms0.copy()
nnp_str = make_structure_from_ase_atoms(ase_atoms, nnp_mode.elementMap)
calculate_on_nnp_structure(nnp_str, nnp_mode)

# changing an atom type
ase_atoms.set_chemical_symbols(['Nb'])
nnp_str = make_structure_from_ase_atoms(ase_atoms, nnp_mode.elementMap)
calculate_on_nnp_structure(nnp_str, nnp_mode)

# stretching the cell
ase_atoms.set_cell(1.5*ase_atoms.get_cell())
nnp_str = make_structure_from_ase_atoms(ase_atoms, nnp_mode.elementMap)
calculate_on_nnp_structure(nnp_str, nnp_mode)

## two atoms # gives NaNs ??
ase_atoms = ase_atoms0.repeat((2,1,1))
io.write('two_atoms.CONTCAR', ase_atoms, format='vasp', vasp5=True)
nnp_str = make_structure_from_ase_atoms(ase_atoms, nnp_mode.elementMap)
calculate_on_nnp_structure(nnp_str, nnp_mode)

which uses these model files created on version 2.1.4 of n2p2. model.zip

mikejwaters commented 2 years ago

@vsumaria Thanks for the inspiration! I've made made an N2P2Calculator based on the fileIOcalculator base class. I'll share it once I've had a chance to thoroughly test it.

vsumaria commented 2 years ago

I am facing some memory issues with my implementation currently when using it for long scale MD etc, trying to understand why that is happening.

Looking forward to your implementation.

mikejwaters commented 2 years ago

@vsumaria I'm currently trying to minimize fileIO so that long runs will be as fast as possible. I'm checking the the ASE people about the canonical way of doing this.

mikejwaters commented 2 years ago

@vsumaria https://gitlab.com/ase/ase/-/merge_requests/2581

vsumaria commented 2 years ago

Updated your n2p2.py file in ase.io to be able to read the entire output.data (your script was only reading the last image). It was easy to update your script.

def read_n2p2(filename='output.data', index=-1, with_energy_and_forces = 'auto' ):
    """Import n2p2 .data file"""
    fd = open(filename,'r') 
    images = list()

    line = fd.readline()
    while 'begin' in line:
        line = fd.readline()
        if 'comment' in line:
            comment = line[7:]
            line = fd.readline()

        cell = np.zeros((3,3))
        for ii in range(3):
            cell[ii] = [float(jj) for jj in line.split()[1:4]]
            line = fd.readline()

        positions = []
        symbols   = []
        charges   = []  # not used yet
        nn        = []  # not used
        forces    = []
        energy = 0.0
        charge = 0.0

        while 'atom' in line:
            sline = line.split()
            positions.append( [float(pos) for pos in sline[1:4]])
            symbols.append(sline[4])
            nn.append(float(sline[5]))
            charges.append(float(sline[6]))
            forces.append( [float(pos) for pos in sline[7:10]])
            line = fd.readline()

        while 'end' not in line:
            if 'energy' in line:
                energy = float(line.split()[-1])
            if 'charge' in line:
                charge = float(line.split()[-1])
            line = fd.readline()

        image = Atoms(symbols = symbols, positions = positions, cell = cell)

        store_energy_and_forces = False
        if with_energy_and_forces == True:
            store_energy_and_forces = True
        elif with_energy_and_forces == 'auto':
            if energy != 0.0 or np.absolute(forces).sum() > 1e-8:
                store_energy_and_forces = True

        if store_energy_and_forces:
            image.calc = SinglePointCalculator(
                            atoms=image,
                            energy = energy,
                            forces = forces,
                            charges = charges)
                            #charge  = charge)
        images.append(image)

    if not index:
        return images
    else:
        return images[index]

mikejwaters commented 2 years ago

Thanks @vsumaria, It should be fixed on the MR now.

mikejwaters commented 2 years ago

@singraber @vsumaria I put together a second ASE calculator that directly uses the PyNNP interface. It also grants easy access to the symmetry functions. Use my MR link for GitLab to check out both calculators.

mikejwaters commented 2 years ago

@vsumaria @singraber The PyNNP interface is MUCH faster. For a small unitcell, I was seeing ~12x speed up!

mikejwaters commented 2 years ago

@singraber I have a pynnp question/feature request. Is there a way to stop pynnp from writing to stdout/stderr? If there isn't a way, can that be added? My log files are being filled with redundant information, in total it was about 1Gb across my current project.

mikejwaters commented 2 years ago

@singraber I noticed that if I don't wrap my atoms back into the unit cell, I get incorrect results using the pynnp interface. I wrap the atoms in the Pynnp calculator I am developing for ASE to work around the problem. But the bug could affect other users!

singraber commented 2 years ago

Hello and sorry for not being responsive these days... I did not have time to look into the ASE calculator yet but it certainly sounds like you are making good progress! Thanks for all your efforts! Regarding your recent questions:

(1) There is a way to stop stdout output in the C++ code and as far as I can see this should also be accessible from the pynnp library. I cannot test this immediately but you could have a look: try if you can toggle the property mode.log.writeToStdout to False. If that does not do what you expect I will have a further look... after all that should be somehow possible.

(2) Atoms are getting wrapped into the box automatically if the structure is generated via readFromFile(), which is usually the case for all nnp tools. However, if you generate it by "manually" filling the element map, atoms, neighbors,... via the python interface then this step is currently missing, yes. It seems I have not put the corresponding function wrapping for Structure::remap into the pynnp code, so you can call it also from the ASE side. This should be very easy, I will do this in the coming days.

singraber commented 2 years ago

Ok, I checked the issues on the latest master:

(1) With setting the mentioned log property to False you can indeed silence the whole stdout output of the n2p2 library, i.e. try the following in your code:

...
m = pynnp.Mode()                                                                
m.log.writeToStdout = False                                                     
m.initialize()                                                                  
m.loadSettingsFile("input.nn")                                                  
m.setupElementMap()
...

Another thing is output via stderr: normally there should not be written much there with one exception being the symmetry function's extrapolation warnings. Usually they should not be ignored, but if you still need to silence them, you can turn them of with the third argument of the mode.setupSymmetryFunctionStatistics() function. Here's the other arguments from the C++ code:

https://github.com/CompPhysVienna/n2p2/blob/6be249e19d1cbabf487753a56a86f9b6395c0bdb/src/libnnp/Mode.h#L188-L207

(2) I just pushed the commit 6be249e to the master branch which makes the remap() function accessible from the pynnp interface. You can call it like that:

...
s = pynnp.Structure()                                                           
s.setElementMap(m.elementMap) 
# .... somehow set up your structure
s.remap() # This will remap all atoms to have coordinates inside the box.
...

Note that this will not happen automatically (only if you fill the structure with readFromFile()) and hence should always be called after all atom coordinates are defined and before the structure is further processed, e.g., by computing the neighbor list.

Hope this helps! All the best, Andreas

CompPhysVienna / n2p2

Wrapping pynnp into an ASE calculator #142