Closed cortner closed 1 month ago
I can see what the output of these functions is, but what is the input? I guess the whole system?
Currently in Molly the first three functions take in the system, the neighbour list and the number of threads but this can be reworked to be just the system.
I prefer potential_energy
to energy
.
My suggestion is
potential_energy(atoms, calc)::Number
forces(atoms, calc)::AbstractVector{<: SVector}
virial(atoms, calc)::AbstractMatrix
but in some ways, this is not that important. it's really easy to re-interpret arrays or change around arguments. First I want shared prototypes.
Of course if we can agree and document what they should return that's even better.
There could also be an
energy_forces_virial(atoms, calc)::Tuple{...}
to compute everything at the same time, that way we don't need to cache things.
What is calc
?
calculator (ASE language). It's the object the calculates things. The potential, etc...
This is a good idea when there starts to be more packages that use forces. With this you only need to implement one interface for a "calculator" package for it to work everywhere.
I also have some suggestions:
Add kwargs to functions calls
potential_energy(atoms, calc; kwargs...)::Number
forces(atoms, calc; kwargs...)::AbstractVector{<:AbstractVector}
virial(atoms, calc; kwargs...)::AbstractMatrix
This allows passing extra commands down to the calculator, like controlling number of tasks (threads), neighbour list etc. Not implementing this causes the packages that don't catch kwargs to fail, if you pass kwargs to some other calculator. So, you would need to use a wrapper to get around this.
I don't want any concrete types to be part of the definition. Thus, I am against forces returning AbstractVector{<: SVector}
in the definition. For this AbstractVector{<:AbstractVector}
is a better definition. It will of course return SVector in many cases, but it should not be in the definition.
I am also in favour of combined commands like
energy_forces_virial(atoms, calc; kwargs...)::Tuple{...}
energy_forces(atoms, calc; kwargs...)::Tuple{...}
These allow useful optimisations and we should not skip them. Although then comes the question Tuple, NamedTuple or something else for the function return. In my opinion, not having concrete type in the definition is the best here too.
Okay so the benefit is being able to switch out calculators but use the same interface. That makes more sense than what I was thinking, which was that different subtypes of AbstractSystem
implement their own versions of the functions.
As I understand it we dispatch on our own calculators when extending the function in our own packages, e.g.
struct MyCalculator
# ...
end
function AtomsBase.potential_energy(sys::AbstractSystem, calc::MyCalculator)
# Return a number
end
I think system information is required, not just atoms, since the boundary will be required (and velocities may be used in some cases?).
forces(atoms, calc; kwargs...)::AbstractVector{<:AbstractVector}
We can of course allow this (or really nobody has to follow what we document) but I strongly suggest that we strongly suggest that the output is Vector{<: SVector}
or a similar type that can be re-interpreted as a Matrix
and in any case to ensure that the forces are stored contiguously.
I think system information is required, not just atoms, since the boundary will be required
sure atoms
= system in almost every package I know hence I used that term. Please replace atoms
with system
throughout
In my opinion, not having concrete type in the definition is the best here too.
There is no definition since we are just providing the prototypes. There is only documentation. Each implementation can define the outputs how they want but the packages using them (Molly) will make some assumptions. It needs to be documented what those assumptions are.
I strongly suggest that we strongly suggest that the output is
Vector{<: SVector}
or a similar type that can be re-interpreted as aMatrix
and in any case to ensure that the forces are stored contiguously.
This can be part of the definition that what ever is returned must work with reinterpret
to matrix. This leaves options to table on the actual structure.
I'm ok with that but how can it be part of the definition? It can only be part of the documentation?
I'm ok with that but how can it be part of the definition? It can only be part of the documentation?
We can supplement testing function. When you implement the interface you then add the test functions that call with data to see, if it complies needed functionality.
Something like that would be nice.
We could also consider implementing non-allocating force command
forces!(f, atoms, calc; kwargs...)
This would be optimal for the performance. But here we would need to have a definition on the data structure for force.
I would be very happy with that, but didn't want to go overboard on a first PR. But it doesn't hurt to put it into the mix. With that we have
potential_energy(system, calculator; kwargs...)
forces(system, calculator; kwargs...)
forces!(F, system, calculator; kwargs...)
virial(system, calculator; kwargs...)
energy_forces_virial(system, calculator; kwargs...)
energy_forces_virial!(F, system, calculator; kwargs...)
Any thoughts on stress and hessian?
Here's a prototype for Molly. To go in Molly:
export MollyCalculator
struct MollyCalculator{PI, SI, GI, F, E}
pairwise_inters::PI
specific_inter_lists::SI
general_inters::GI
force_units::F
energy_units::E
n_threads::Int
end
function AtomsBase.potential_energy(abstract_sys::AbstractSystem, calc::MollyCalculator)
sys_nointers = System(abstract_sys, calc.energy_units, calc.force_units)
sys = System(
sys_nointers;
pairwise_inters=calc.pairwise_inters,
specific_inter_lists=calc.specific_inter_lists,
general_inters=calc.general_inters,
)
return potential_energy(sys, find_neighbors(sys); n_threads=calc.n_threads)
end
Usage:
using Molly, AtomsBase, AtomsBaseTesting
ab_sys = AbstractSystem(
make_test_system().system;
boundary_conditions = [Periodic(), Periodic(), Periodic()],
bounding_box = [[1.54732, 0.0 , 0.0 ],
[0.0 , 1.4654985, 0.0 ],
[0.0 , 0.0 , 1.7928950]]u"Å",
)
coul = Coulomb(coulomb_const=2.307e-21u"kJ*Å", force_units=u"kJ/Å", energy_units=u"kJ")
calc = MollyCalculator((coul,), (), (), u"kJ/Å", u"kJ", 1)
potential_energy(ab_sys, calc)
2.723905224571781e-20 kJ
This requires Molly master due to an unrelated fix.
Using atom properties such as σ and ϵ might require another field to the calculator. The above approach doesn't require additional arguments but does recompute the neighbours each time.
Any thoughts on stress and hessian?
I don't use them, so no strong opinions.
What about having a generic function
calculate(property::Symbol, system, atoms)
and or even allow multiple properties. The functions above could then just be treated as syntax sugar.
calculate(property::Symbol, system, atoms)
I am against this for a two reasons.
potential_energy(system, calc)
is more readableThird point would be that this would be bad interface design, as compiler does not know before hand what happens - this matters for optimizations and multi threaded design.
I disagree.
It doesn't have to be type-unstable if we replace Symbol
with something else. For example,
calculate(PotentialEnergy, system, calculator)
where PotentialEnergy
is a singleton type.
With the above, readability is note an issue.
And the potential advantage is that (i) one can let the use decide which properties to compute; and (ii) it is easy to extend (and if type stability should be ensured then with Val
) before a new property type or function is introduced into the public interface.
I can see this being especially useful for electronic structure codes which can output far more properties than just energies, forces and virials.
My main issue with this is that, it does not provide anything new. You can already do
calculate( ::PotentialEnergy, data, calc ) = potential_energy(data, calc)
calculate( ::Forces, data, calc ) = forces(data, calc)
etc.
For the implementation point the best is to have individual calls for methods only.
I meant it the other way round.
potential_energy(system, calculator) = calculate(PotentialEnergy, sytem, calculator)
Secondly you can now do
calculate((PotentialEnergy, Forces), system, calculator)
calculate((PotentialEnergy, Forces, Virial), system, calculator)
calculate((Forces, Virial), system, calculator)
But I'm not even sure that part is important.
I'm more interested in going beyond EFV, e.g.,
calculate( :hessian, system, calculator) # or Val{:hessian}
calculate( :preconditioner, system, calculator)
calculate( :site_energies, system, calculator)
because there is no prototype for a hessian or preconditioner or site energies yet.
Or we can get a list of electronic or magnetic or optical properties if the calculator
is an electronic structure method, etc ...
You either need to have
function calculate( ::PotentialEnergy, data, calculator; kwargs...)
# definition here
end
or
function potential_energy(data, calculator; kwargs...)
# definition here
end
Then you can add implementation for the other. To me the latter is just way more explicit. Plus it saves people time when they don't need to write "calculate".
Catching up on GitHub pings, sorry for delay – this discussion overlaps substantially with one @mfherbst, @singularitti, and some others had at JuliaCon and played around with a bit at the hackathon, see https://github.com/JuliaMolSim/ElectronicStructure.jl
Personally, I like the calculate
syntax – while I agree it potentially sacrifices a small amount of readability, the flexibility in dispatch and performance optimization (e.g. as @cortner points out above, to efficiently compute energy and forces in one pass, which is especially important in a DFT context) is worth it in my view. Also, it makes it easy to add on more properties later without having to add a new function to the interface.
If there is sufficient support for the calculate
interface then we can add it to the PR anytime. Another advantage of this is, that it extends naturally to systems that are not atoms.
I'm sorry, I cannot see https://github.com/JuliaMolSim/ElectronicStructure.jl, can I be enrolled in the orgnization?
Sorry, only got to catch up on github pings right now. I'll try to give my input on the points raised in a structured way.
(1) Where should these functions live?
We don't have to speak the final word on this, but I would suggest to put them into a separate package, that depends on AtomsBase, simply because for a number of tasks (e.g. plotting, IO) calculators are unimportant. One of the things that really bothers me with ASE is that it turned out to become a zoo for everything, which I would like to avoid with AtomsBase. Making new packages in Julia is easy enough and there are plenty packages just defining a few functions without any methods. This could well live in a package AtomsCalculators
or so. But as I said ... not urgent for me.
(2) I think we have to be careful about a zoo of methods such as energy_forces_virial
, energy_forces
etc. Also I would lean to having all in singular (i.e. force instead of forces). I'm a little worried about to short names as e.g. a forces
function too easily it clashes with a name one would like to use for a variable. Having a small set of useful function names reserved in the AtomsBase ecosystem sounds advantageous. For my personal taste that is energy, forces and stresses / virals. I'm not sure about the additional forces!
and similar mutating functions as they are of little use in electronic structure theory methods. Instead I would propose (3) below.
(3) I think just having a calculator and a system to drive the calculation is too little as most calculators have state, which can be useful when doing another calculation later. To exploit that the idea we came up with in https://github.com/JuliaMolSim/ElectronicStructure.jl (BTW, public now) was to have for each calculator a number of specific structs:
AtomsCalculators
, such as ElectronicStructure
.The idea is then to have a calculate!
function, which simply takes a (possibly empty) initial state and advances it to the point the user requests (e.g. by marker structs or similar), i.e. also a state
struct would be returned. Functions like energy_forces_virial
would then be used to simply access the content of this state. This works very well with electronic structre codes, but also with fast interatomic potentials, where the state constructor would simply allocate and hold the output arrays like the forces and the calculate!
function would just work on it in-place. If calculate!
is called again with a different system, the data is simply overwitten.
Of course we don't yet have to finalise all details now to have an inital useful version. I'd thus suggest the following:
(a) Maybe put function definitions in a separate package
(b) Use calculate
to return a calculator-specific container of the final state, which may be re-used with calculate!
.
(c) Define a standardised way to access energy, forces, stresses data from the state. This can be functions like energy( )
, but maybe better to avoid namespace clash is to have mandatory fields like energy
, force
etc. with well-defined content or define getproperty
appropritately.
Thanks for the comments, Michael. A few first responses.
force
; is technically incorrect, and it is just not how people use it. calculate
or calculate!
thing in the background with user-facing simplified interfaces (cf the wrapper I mentioned), as long as all that is actually hidden from the user. (And maybe this is what you had in mind?) But I really also want @jgreener64 to be happy with the framework. The three of us represent three typical computational scales.Our initial point was to implement MD and geometry optimisation related interface. Michael is suggesting to extend this to other properties also.
I would say that having a general framework would be a good thing! And, if we pick this route, then a separate package would be the way to go.
For the interface itself, one can identify
For these parts, my opinion is that 4. could be done with keyword arguments.
1, can be in function names or control types that are given to calculate
commands. This should be defined in one place, so that the interface is usable for everyone the same way.
3, is the part where I see things a bit different way than Michael. My opinion is that everything that is specific to a calculator, like internal state (wave function, tmp arrays, etc.), should be packed together to calculator and not to be in different structure. This does lead to a case, where we have to allow mutation of calculator, if we want it to store data like a wave function. So, I would make this a part of definition, that calculators can be mutated.
If I'm reading this right there is a sufficient diversity of opinions that I'd really like to push for an initial lightweight interface within AtomsBase
so we can make progress on the projects we discussed.
But with a plan to replace it with a more general separate package as soon as possible.
I don't have strong opinions on calculate
v specific functions. Maybe calculate
is preferable to the combinatorial explosion of function combinations we could otherwise run into.
I do quite like the simplicity of just passing the system and calculator, with extra system data like neighbours (calculator-dependent) as keyword args, and getting a value back. But yes we would have to allow calculator mutation if we did that.
That doesn't stop packages defining a calculate!
themselves and having calculate
do setup, run calculate!
and return the result.
I prefer forces
to force
.
If this is eventually going to be a separate package, which does make sense, can't it just be a new package directly? If we make an AtomsBase release with this stuff in we would then need to make a breaking release to remove it.
If this is eventually going to be a separate package, which does make sense, can't it just be a new package directly? If we make an AtomsBase release with this stuff in we would then need to make a breaking release to remove it.
Quick vote on this? Do thumbsup
if you want a new package now with minimal funcionality; thumbsdown
if you prefer to keep it here initially, smile
if you are happy either way.
I do quite like the simplicity of just passing the system and calculator, with extra system data like neighbours (calculator-dependent) as keyword args, and getting a value back. But yes we would have to allow calculator mutation if we did that.
thought on that : exactly how the interface is implemented is up to the person implementing it? E.g. for ACE models we will 100% provide BOTH
forces(system, calc)
forces(system, calc, params)
so we can also do for parameter estimation, and eventually one of them will be a fallback for the other.
I'm only saying that I don't think having a simple interface now prevents us from experimenting and expanding it later or changing it.
I realise I'm again the last to respond ... sorry about that.
(And maybe this is what you had in mind?)
Partly yes ... and I agree with better having a small and lean thing first to make some progress and worry about the complicated details later.
exactly how the interface is implemented is up to the person implementing it?
Hmm. That can be tricky, because interfaces are only useful if there are guidelines how they can be used. Thus the more we coordinate, the more useful it becomes :smile:.
I'm ok with forces
instead of force
.
With that in mind from my point of view there are the following things that we need to resolve now:
calculate(Forces(), calc, system)
or forces(calc, system)
makes no difference, but exporting a nice name
like forces
will lead to name clashes and a potential later combinatorial explosion.
With Forces()
I also see clear advantages when dealing with potentially multiple interface layers
(developer-centred / user-centred etc.) later on due to easy multiple dispatch.calc
first or system
first. For me the calc
first has always been natural, but I don't care so much.While I do like the calculate
interface, I actually think forces(...)
is infinitely more readable. I would be sad to not provide both. That said, we don't need to export anything.
A small issue: why calculate(Forces(), ...)
and not calculate(Forces, ...)
? I find the extra () quite ugly - extra noise I don't want to see.
Given the discussion up to this point, I think we now start a new package, where we implement the interface(s) discussed above. I propose we don't export anything initially, and this will allow us to have both the calculate
and forces
type calls.
For the package name, is AtomsCalculators.jl
ok?
Thanks, @cortner, I'm ok with something like AtomsCalculators.forces( ... )
.
Why Forces()
is because than you reserve yourself the possibility to later add additional quantifiers (e.g. you only want the forces on 2 atoms). But maybe this is overengineering it.
And :+1: on the package name :).
Why Forces() is because than you reserve yourself the possibility to later add additional quantifiers (e.g. you only want the forces on 2 atoms). But maybe this is overengineering it.
That's an interesting point. E.g. suppose we want to implement a distributed assembly of energy, forces etc, then you can tell the calculator that way which part of the system it should be computing these on. Something to keep in mind ... but maybe fleshed out when needed.
Have again gotten woefully behind on these notifications, but jumping in to add that I'm for the slightly more "developer-oriented" style (i.e. calculate
) for the interface functions that we specify. Of course there's nothing stopping individual packages "wrapping" that in more specifically-named things for their own purposes, as they likely should to conform to norms/convenience considerations in their own field. But if what we want is interoperability, I think keeping it as generic as reasonable is the way to go.
And calling back to the initial discussions we had when scoping out AtomsBase, I would strongly advocate for just building something that we will inevitably (at least partially) tear down and rebuild; but it's so much easier to have these discussions around actual code than around the hypothetical idea of code that it's not worth the analysis paralysis of worrying whether we have it "just right" before we write it. I'll spoil the ending – we won't have it just right, but we won't be able to know what that even is until we really stress-test it.
analysis paralysis
100% agreed.
A small issue: why
calculate(Forces(), ...)
and notcalculate(Forces, ...)
? I find the extra () quite ugly - extra noise I don't want to see.
@cortner I've considered this question before, and the suggestion I got from Julia's documentation is that instances are generally preferable since you can always add more internal parameters in the future:
The preferred style is to use instances by default, and only add methods involving
Type{MyType}
later if they become necessary to solve some problems.
See also: https://discourse.julialang.org/t/singleton-types-vs-instances-as-type-parameters/2802/2
A good example is that Roots.jl dispatches on instances.
That's a fair point and I'm ok with that.
I'm closing this, since this is now all more or less addressed in ongoing work in AtomsCalculators.
I propose that we stop ~dragging our feet~ sleeping on this and implement prototypes for
We can add more if desired e.g.
come to mind.
potential_energy
could also be called justenergy
, I don't mind either way.This will make an ultra-lightweight interface. If needed it can be moved out of
AtomsBase
into a separate package, but I see no harm having it here for now. IfMolly
were the only thing that does simulations then it could live there, but it's not.If people agree please
:thumbsup
and I'll make a PR.CC @jgreener64 @rkurchin @mfherbst @tjjarvinen
UPDATE: after discussion the current proposal is the following: