Closed yanxon closed 5 years ago
@yanxon
1. I'm not sure you'd have to necessarily change your algorithm but you would have to precompute or pre call function parameters you are calling from pymatgen objects, like neighbors or species, and store them in an array before passing to a numba function.
There's also a lot of functions called within these functions so you would need to start at the lowest level of computation and rewrite everything in the tree as a numba function. Say you wanted to compile G5_prime, you would have to compile every function that gets called in the process of G5 prime, and the functions that get called within those functions until you're down to native python and numpy.
Also, the only data types that are supported in numba functions are arrays, complex numbers, floating point numbers, and integers. Everything else like dictionaries and lists would have to go. You won't be able to pass pymatgen objects or anything to a numba function either. This may cause trouble when you're calling your symmetry operations from the dictionary.
2. Numba doesn't directly support compiling a method in a class but what you can do, as I did in the bispectrum, is write the function outside of the class and then write a method in the class that calls the function with input parameters from the class.
3. The only other solutions for speedup that I know of would be to write C/C++/Fortran functions to call from python. But that would be a lot more work.
Numba would be the way to go, and I'm happy to help here. I know for certain that we wouldn't have to change the entire flow of the code but we would have to change where certain data structures are used—and do a lot of precomputing/precalling to store values in an indexable array for calculation. This wouldn't be too difficult as we can use your for loops to index these arrays.
The process to do this would be to rewrite each of these methods that you want to compile as stand alone functions that use only numba datatypes. Then we would write wrapper methods within the class that pull the data we need from the pymatgen objects, dictionaries, lists, etc. store them in arrays and call the numba function.
@David-Zagaceta,
Thank you for the suggestion.
I agree that we need to extract information from Pymatgen object. This is very easy to do, in fact. The tough part is to match what kind information to be passed to a certain function.
Although, how do you declare a string type in the eager compilation mode? For example, numba.f8[:] is for an array of floats.
@David-Zagaceta,
I think I solved this puzzle. The reason I need string as arg is that I need to compare two species of atoms: Na and Cl. If Na != Cl, then no calculation is needed.
It seem like Numba does not take list of list of string as an argument. (If you know how to make Numba take this type of arg please let me know.)
If not, I can give 'Na' and 'Cl' a numerical representation based on the number of proton.
@yanxon
You would have to feed in the string as a 1-D array of characters. The numba data type for characters is
numba.i1
for a 1-D array of characters, the datatype would be
numba.i1(:)
Feeding the string
arg = "Na"
to a function would result in an indexable array where
arg[0] = 'N'
arg[1] = 'a'
Hope that helps
EDIT:
Make sure you pass unicode strings from python to numba
The following string operations are supported in numba
The following functions, attributes and methods are currently supported:
len()
Hi @David-Zagaceta,
I tried your suggestion. I obtain this type of error:
TypeError: No matching definition for argument type(s) array(float64, 1d, C), reflected list(array(float64, 1d, C)), reflected list(unicode_type), unicode_type, float64, float64, float64
Please look here: https://github.com/qzhu2017/PyXtal_FF/blob/9b965624d4737de7eda8eba477579b9350092f2b/mapp/descriptors/new_bp.py#L235
This will compile just fine. However, if you uncomment line 233+line 234 instead, the code gives me that error. Can you please take a look with this eager compilation issue?
Thanks,
Howard
@qzhu2017,
Numba definitely performs so much better than without. I am calculating one Si structure with 64 atoms per unit cell. The symmetry functions parameters are:
symmetry1 = {'G2': {'eta': [0.036, 0.071, 0.179, 0.357, 0.714, 1.786, 3.571, 7.142, 17.855]},
'G4': {'lambda': [-1, 1], 'zeta':[1], 'eta': [0.036, 0.071, 0.179, 0.357, 0.714, 1.786, 3.571, 7.142, 17.855]}}
Rc = 5.2
Here is the result of the calculations:
Calculating the gaussian symmetry functions with new script.....
The calculation time with Numba: 57.60 s
Calculating the gaussian symmetry functions with old script.....
The calculation time without Numba: 1729.2758746147156 s
Check the equivalence of old and new descriptors......
The old and new behlerparrinello descriptors are equal:
True
@David-Zagaceta, This Numba performance is without eager compilation which I think it is already very good. Do you think it will perform significantly better with eager compilation?
@David-Zagaceta
To follow up my question in the email, we need to use Numba decorator since the code slowly calculates the Gaussian functions. I would like to use Numba here: https://github.com/qzhu2017/PyXtal_FF/blob/d69767aabd90c2e2515a9fe83b80c45001632d65/mapp/descriptors/behlerparrinello.py#L60
and here:
https://github.com/qzhu2017/PyXtal_FF/blob/d69767aabd90c2e2515a9fe83b80c45001632d65/mapp/descriptors/behlerparrinello.py#L1507
Thanks,
Howard