Energies, Forces, Etc - Githubissues

cortner commented 9 months ago

I propose that we stop ~dragging our feet~ sleeping on this and implement prototypes for

potential_energy
forces
virial

We can add more if desired e.g.

stress
hessian

come to mind. potential_energy could also be called just energy, I don't mind either way.

This will make an ultra-lightweight interface. If needed it can be moved out of AtomsBase into a separate package, but I see no harm having it here for now. If Molly were the only thing that does simulations then it could live there, but it's not.

If people agree please :thumbsup and I'll make a PR.

CC @jgreener64 @rkurchin @mfherbst @tjjarvinen

UPDATE: after discussion the current proposal is the following:

potential_energy(system, calculator; kwargs...)
forces(system, calculator; kwargs...)
forces!(F, system, calculator; kwargs...)
virial(system, calculator; kwargs...)
energy_forces_virial(system, calculator; kwargs...)
energy_forces_virial!(F, system, calculator; kwargs...)

jgreener64 commented 9 months ago

I can see what the output of these functions is, but what is the input? I guess the whole system?

Currently in Molly the first three functions take in the system, the neighbour list and the number of threads but this can be reworked to be just the system.

I prefer potential_energy to energy.

cortner commented 9 months ago

My suggestion is

potential_energy(atoms, calc)::Number
forces(atoms, calc)::AbstractVector{<: SVector}
virial(atoms, calc)::AbstractMatrix

but in some ways, this is not that important. it's really easy to re-interpret arrays or change around arguments. First I want shared prototypes.

Of course if we can agree and document what they should return that's even better.

cortner commented 9 months ago

There could also be an

energy_forces_virial(atoms, calc)::Tuple{...}

to compute everything at the same time, that way we don't need to cache things.

jgreener64 commented 9 months ago

What is calc?

cortner commented 9 months ago

calculator (ASE language). It's the object the calculates things. The potential, etc...

tjjarvinen commented 9 months ago

This is a good idea when there starts to be more packages that use forces. With this you only need to implement one interface for a "calculator" package for it to work everywhere.

I also have some suggestions:

Add kwargs to functions calls

potential_energy(atoms, calc; kwargs...)::Number
forces(atoms, calc; kwargs...)::AbstractVector{<:AbstractVector}
virial(atoms, calc; kwargs...)::AbstractMatrix

This allows passing extra commands down to the calculator, like controlling number of tasks (threads), neighbour list etc. Not implementing this causes the packages that don't catch kwargs to fail, if you pass kwargs to some other calculator. So, you would need to use a wrapper to get around this.

I don't want any concrete types to be part of the definition. Thus, I am against forces returning AbstractVector{<: SVector} in the definition. For this AbstractVector{<:AbstractVector} is a better definition. It will of course return SVector in many cases, but it should not be in the definition.

I am also in favour of combined commands like

energy_forces_virial(atoms, calc; kwargs...)::Tuple{...}
energy_forces(atoms, calc; kwargs...)::Tuple{...}

These allow useful optimisations and we should not skip them. Although then comes the question Tuple, NamedTuple or something else for the function return. In my opinion, not having concrete type in the definition is the best here too.

jgreener64 commented 9 months ago

Okay so the benefit is being able to switch out calculators but use the same interface. That makes more sense than what I was thinking, which was that different subtypes of AbstractSystem implement their own versions of the functions.

As I understand it we dispatch on our own calculators when extending the function in our own packages, e.g.

struct MyCalculator
    # ...
end

function AtomsBase.potential_energy(sys::AbstractSystem, calc::MyCalculator)
    # Return a number
end

I think system information is required, not just atoms, since the boundary will be required (and velocities may be used in some cases?).

cortner commented 9 months ago

forces(atoms, calc; kwargs...)::AbstractVector{<:AbstractVector}

We can of course allow this (or really nobody has to follow what we document) but I strongly suggest that we strongly suggest that the output is Vector{<: SVector} or a similar type that can be re-interpreted as a Matrix and in any case to ensure that the forces are stored contiguously.

cortner commented 9 months ago

I think system information is required, not just atoms, since the boundary will be required

sure atoms = system in almost every package I know hence I used that term. Please replace atoms with system throughout

cortner commented 9 months ago

In my opinion, not having concrete type in the definition is the best here too.

There is no definition since we are just providing the prototypes. There is only documentation. Each implementation can define the outputs how they want but the packages using them (Molly) will make some assumptions. It needs to be documented what those assumptions are.

tjjarvinen commented 9 months ago

I strongly suggest that we strongly suggest that the output is Vector{<: SVector} or a similar type that can be re-interpreted as a Matrix and in any case to ensure that the forces are stored contiguously.

This can be part of the definition that what ever is returned must work with reinterpret to matrix. This leaves options to table on the actual structure.

cortner commented 9 months ago

I'm ok with that but how can it be part of the definition? It can only be part of the documentation?

tjjarvinen commented 9 months ago

I'm ok with that but how can it be part of the definition? It can only be part of the documentation?

We can supplement testing function. When you implement the interface you then add the test functions that call with data to see, if it complies needed functionality.

cortner commented 9 months ago

Something like that would be nice.

tjjarvinen commented 9 months ago

We could also consider implementing non-allocating force command

forces!(f, atoms, calc; kwargs...)

This would be optimal for the performance. But here we would need to have a definition on the data structure for force.

cortner commented 9 months ago

I would be very happy with that, but didn't want to go overboard on a first PR. But it doesn't hurt to put it into the mix. With that we have

potential_energy(system, calculator; kwargs...)
forces(system, calculator; kwargs...)
forces!(F, system, calculator; kwargs...)
virial(system, calculator; kwargs...)
energy_forces_virial(system, calculator; kwargs...)
energy_forces_virial!(F, system, calculator; kwargs...)

Any thoughts on stress and hessian?

jgreener64 commented 9 months ago

Here's a prototype for Molly. To go in Molly:

export MollyCalculator

struct MollyCalculator{PI, SI, GI, F, E}
    pairwise_inters::PI
    specific_inter_lists::SI
    general_inters::GI
    force_units::F
    energy_units::E
    n_threads::Int
end

function AtomsBase.potential_energy(abstract_sys::AbstractSystem, calc::MollyCalculator)
    sys_nointers = System(abstract_sys, calc.energy_units, calc.force_units)
    sys = System(
        sys_nointers;
        pairwise_inters=calc.pairwise_inters,
        specific_inter_lists=calc.specific_inter_lists,
        general_inters=calc.general_inters,
    )
    return potential_energy(sys, find_neighbors(sys); n_threads=calc.n_threads)
end

Usage:

using Molly, AtomsBase, AtomsBaseTesting

ab_sys = AbstractSystem(
    make_test_system().system; 
    boundary_conditions = [Periodic(), Periodic(), Periodic()],
    bounding_box = [[1.54732, 0.0      , 0.0      ],
                    [0.0    , 1.4654985, 0.0      ],
                    [0.0    , 0.0      , 1.7928950]]u"Å",
)

coul = Coulomb(coulomb_const=2.307e-21u"kJ*Å", force_units=u"kJ/Å", energy_units=u"kJ")
calc = MollyCalculator((coul,), (), (), u"kJ/Å", u"kJ", 1)
potential_energy(ab_sys, calc)

2.723905224571781e-20 kJ

This requires Molly master due to an unrelated fix.

Using atom properties such as σ and ϵ might require another field to the calculator. The above approach doesn't require additional arguments but does recompute the neighbours each time.

Any thoughts on stress and hessian?

I don't use them, so no strong opinions.

cortner commented 9 months ago

What about having a generic function

calculate(property::Symbol, system, atoms)

and or even allow multiple properties. The functions above could then just be treated as syntax sugar.

tjjarvinen commented 9 months ago

calculate(property::Symbol, system, atoms)

I am against this for a two reasons.

It makes code harder to read - potential_energy(system, calc) is more readable
It is type unstable. This most likely is not a problem for Julia, but if we ever want to make C-interface out of it, it will lead to problems. And is simply the best to avoid them as much as possible before hand.

Third point would be that this would be bad interface design, as compiler does not know before hand what happens - this matters for optimizations and multi threaded design.

cortner commented 9 months ago

I disagree.

It doesn't have to be type-unstable if we replace Symbol with something else. For example,
```
calculate(PotentialEnergy, system, calculator)
```
where PotentialEnergy is a singleton type.
With the above, readability is note an issue.

And the potential advantage is that (i) one can let the use decide which properties to compute; and (ii) it is easy to extend (and if type stability should be ensured then with Val) before a new property type or function is introduced into the public interface.

I can see this being especially useful for electronic structure codes which can output far more properties than just energies, forces and virials.

tjjarvinen commented 9 months ago

My main issue with this is that, it does not provide anything new. You can already do

calculate( ::PotentialEnergy, data, calc ) = potential_energy(data, calc)
calculate( ::Forces, data, calc ) = forces(data, calc)

etc.

For the implementation point the best is to have individual calls for methods only.

cortner commented 9 months ago

I meant it the other way round.

potential_energy(system, calculator) = calculate(PotentialEnergy, sytem, calculator)

Secondly you can now do

calculate((PotentialEnergy, Forces), system, calculator) 
calculate((PotentialEnergy, Forces, Virial), system, calculator) 
calculate((Forces, Virial), system, calculator)

But I'm not even sure that part is important.

I'm more interested in going beyond EFV, e.g.,

calculate( :hessian, system, calculator)   # or Val{:hessian}
calculate( :preconditioner, system, calculator)
calculate( :site_energies, system, calculator)

because there is no prototype for a hessian or preconditioner or site energies yet.

Or we can get a list of electronic or magnetic or optical properties if the calculator is an electronic structure method, etc ...

tjjarvinen commented 9 months ago

You either need to have


function calculate( ::PotentialEnergy, data, calculator; kwargs...)
      # definition here
end

or

function potential_energy(data, calculator; kwargs...)
     # definition here
end

Then you can add implementation for the other. To me the latter is just way more explicit. Plus it saves people time when they don't need to write "calculate".

rkurchin commented 9 months ago

Catching up on GitHub pings, sorry for delay – this discussion overlaps substantially with one @mfherbst, @singularitti, and some others had at JuliaCon and played around with a bit at the hackathon, see https://github.com/JuliaMolSim/ElectronicStructure.jl

Personally, I like the calculate syntax – while I agree it potentially sacrifices a small amount of readability, the flexibility in dispatch and performance optimization (e.g. as @cortner points out above, to efficiently compute energy and forces in one pass, which is especially important in a DFT context) is worth it in my view. Also, it makes it easy to add on more properties later without having to add a new function to the interface.

cortner commented 9 months ago

If there is sufficient support for the calculate interface then we can add it to the PR anytime. Another advantage of this is, that it extends naturally to systems that are not atoms.

singularitti commented 9 months ago

I'm sorry, I cannot see https://github.com/JuliaMolSim/ElectronicStructure.jl, can I be enrolled in the orgnization?

mfherbst commented 9 months ago

Sorry, only got to catch up on github pings right now. I'll try to give my input on the points raised in a structured way.

(1) Where should these functions live? We don't have to speak the final word on this, but I would suggest to put them into a separate package, that depends on AtomsBase, simply because for a number of tasks (e.g. plotting, IO) calculators are unimportant. One of the things that really bothers me with ASE is that it turned out to become a zoo for everything, which I would like to avoid with AtomsBase. Making new packages in Julia is easy enough and there are plenty packages just defining a few functions without any methods. This could well live in a package AtomsCalculators or so. But as I said ... not urgent for me.

(2) I think we have to be careful about a zoo of methods such as energy_forces_virial, energy_forces etc. Also I would lean to having all in singular (i.e. force instead of forces). I'm a little worried about to short names as e.g. a forces function too easily it clashes with a name one would like to use for a variable. Having a small set of useful function names reserved in the AtomsBase ecosystem sounds advantageous. For my personal taste that is energy, forces and stresses / virals. I'm not sure about the additional forces! and similar mutating functions as they are of little use in electronic structure theory methods. Instead I would propose (3) below.

(3) I think just having a calculator and a system to drive the calculation is too little as most calculators have state, which can be useful when doing another calculation later. To exploit that the idea we came up with in https://github.com/JuliaMolSim/ElectronicStructure.jl (BTW, public now) was to have for each calculator a number of specific structs:

a calculator struct: This would mostly be a marker struct, maybe define the basic setup, i.e. which executable to use, which thread setup etc.
a parameter struct: This defines discretisation, tolerances, numerical details: Allows to convert between similar calculators or easily construct adaptive iterative schemes, where the parameters are changed, but the rest is identical. Some fields in here would be standardised across calculator classes (e.g. DFT calculators). Details could be worked out in more specific sub-packages to AtomsCalculators, such as ElectronicStructure.
and a state struct: Contains a reference to the state the calculator is in (e.g. pointer to directory where files are dropped, pointer to struct containing intermediate results etc.)

The idea is then to have a calculate! function, which simply takes a (possibly empty) initial state and advances it to the point the user requests (e.g. by marker structs or similar), i.e. also a state struct would be returned. Functions like energy_forces_virial would then be used to simply access the content of this state. This works very well with electronic structre codes, but also with fast interatomic potentials, where the state constructor would simply allocate and hold the output arrays like the forces and the calculate! function would just work on it in-place. If calculate! is called again with a different system, the data is simply overwitten.

Of course we don't yet have to finalise all details now to have an inital useful version. I'd thus suggest the following:

(a) Maybe put function definitions in a separate package (b) Use calculate to return a calculator-specific container of the final state, which may be re-used with calculate!. (c) Define a standardised way to access energy, forces, stresses data from the state. This can be functions like energy( ), but maybe better to avoid namespace clash is to have mandatory fields like energy, force etc. with well-defined content or define getproperty appropritately.

cortner commented 9 months ago

Thanks for the comments, Michael. A few first responses.

Your parameters, structs is following the Lux framework, which I like a lot, but I think this is a developer-oriented and not a user-oriented interface. Too much overhead for many use-cases. I would personally have the "calculator" be wrapper containing the model, the parameters and the state.
We are trying to provide an interface for many possible use-cases, materials and molecules, fast empirical potentials, slow MLFFs, and electronic structure, not just electronic structure. For some of those, the allocation is the bottleneck and in-place functions are important.
I vote against the singular force; is technically incorrect, and it is just not how people use it.
I am open to the idea of a calculate or calculate! thing in the background with user-facing simplified interfaces (cf the wrapper I mentioned), as long as all that is actually hidden from the user. (And maybe this is what you had in mind?) But I really also want @jgreener64 to be happy with the framework. The three of us represent three typical computational scales.

tjjarvinen commented 9 months ago

Our initial point was to implement MD and geometry optimisation related interface. Michael is suggesting to extend this to other properties also.

I would say that having a general framework would be a good thing! And, if we pick this route, then a separate package would be the way to go.

For the interface itself, one can identify

What is the final result - energy, forces, virial, (dipole moment, polarisability, etc.)
AtomsBase input structure
Calculator that does the calculation itself
Some way to give extra information or tune the calculation

For these parts, my opinion is that 4. could be done with keyword arguments.

1, can be in function names or control types that are given to calculate commands. This should be defined in one place, so that the interface is usable for everyone the same way.
3, is the part where I see things a bit different way than Michael. My opinion is that everything that is specific to a calculator, like internal state (wave function, tmp arrays, etc.), should be packed together to calculator and not to be in different structure. This does lead to a case, where we have to allow mutation of calculator, if we want it to store data like a wave function. So, I would make this a part of definition, that calculators can be mutated.

cortner commented 9 months ago

If I'm reading this right there is a sufficient diversity of opinions that I'd really like to push for an initial lightweight interface within AtomsBase so we can make progress on the projects we discussed.

But with a plan to replace it with a more general separate package as soon as possible.

jgreener64 commented 9 months ago

I don't have strong opinions on calculate v specific functions. Maybe calculate is preferable to the combinatorial explosion of function combinations we could otherwise run into.

I do quite like the simplicity of just passing the system and calculator, with extra system data like neighbours (calculator-dependent) as keyword args, and getting a value back. But yes we would have to allow calculator mutation if we did that.

That doesn't stop packages defining a calculate! themselves and having calculate do setup, run calculate! and return the result.

I prefer forces to force.

If this is eventually going to be a separate package, which does make sense, can't it just be a new package directly? If we make an AtomsBase release with this stuff in we would then need to make a breaking release to remove it.

cortner commented 9 months ago

If this is eventually going to be a separate package, which does make sense, can't it just be a new package directly? If we make an AtomsBase release with this stuff in we would then need to make a breaking release to remove it.

Quick vote on this? Do thumbsup if you want a new package now with minimal funcionality; thumbsdown if you prefer to keep it here initially, smile if you are happy either way.

cortner commented 9 months ago

I do quite like the simplicity of just passing the system and calculator, with extra system data like neighbours (calculator-dependent) as keyword args, and getting a value back. But yes we would have to allow calculator mutation if we did that.

thought on that : exactly how the interface is implemented is up to the person implementing it? E.g. for ACE models we will 100% provide BOTH

forces(system, calc) 
forces(system, calc, params)

so we can also do for parameter estimation, and eventually one of them will be a fallback for the other.

I'm only saying that I don't think having a simple interface now prevents us from experimenting and expanding it later or changing it.

mfherbst commented 9 months ago

I realise I'm again the last to respond ... sorry about that.

(And maybe this is what you had in mind?)

Partly yes ... and I agree with better having a small and lean thing first to make some progress and worry about the complicated details later.

exactly how the interface is implemented is up to the person implementing it?

Hmm. That can be tricky, because interfaces are only useful if there are guidelines how they can be used. Thus the more we coordinate, the more useful it becomes :smile:.

I'm ok with forces instead of force.

With that in mind from my point of view there are the following things that we need to resolve now:

Specification of the interface: function names versus marker structs. I think from the user point of view calculate(Forces(), calc, system) or forces(calc, system) makes no difference, but exporting a nice name like forces will lead to name clashes and a potential later combinatorial explosion. With Forces() I also see clear advantages when dealing with potentially multiple interface layers (developer-centred / user-centred etc.) later on due to easy multiple dispatch.
Mutation in the calculator or not: Here I think I would be ok with mutation if it is part of the interface definition how to decompose the calculator into the object that contains the state and the objects that contain state-less configuration (parameters, general setup etc.). Because this then offers a clear way how to convert between a more user-centred interface (calculator just one object) to a more developer-centred interface (state traversal clear). So we should enforce that there are functions around that achieve that.
calc first or system first. For me the calc first has always been natural, but I don't care so much.

cortner commented 8 months ago

While I do like the calculate interface, I actually think forces(...) is infinitely more readable. I would be sad to not provide both. That said, we don't need to export anything.

A small issue: why calculate(Forces(), ...) and not calculate(Forces, ...) ? I find the extra () quite ugly - extra noise I don't want to see.

Given the discussion up to this point, I think we now start a new package, where we implement the interface(s) discussed above. I propose we don't export anything initially, and this will allow us to have both the calculate and forces type calls.

For the package name, is AtomsCalculators.jl ok?

mfherbst commented 8 months ago

Thanks, @cortner, I'm ok with something like AtomsCalculators.forces( ... ).

Why Forces() is because than you reserve yourself the possibility to later add additional quantifiers (e.g. you only want the forces on 2 atoms). But maybe this is overengineering it.

And :+1: on the package name :).

cortner commented 8 months ago

Why Forces() is because than you reserve yourself the possibility to later add additional quantifiers (e.g. you only want the forces on 2 atoms). But maybe this is overengineering it.

That's an interesting point. E.g. suppose we want to implement a distributed assembly of energy, forces etc, then you can tell the calculator that way which part of the system it should be computing these on. Something to keep in mind ... but maybe fleshed out when needed.

rkurchin commented 8 months ago

Have again gotten woefully behind on these notifications, but jumping in to add that I'm for the slightly more "developer-oriented" style (i.e. calculate) for the interface functions that we specify. Of course there's nothing stopping individual packages "wrapping" that in more specifically-named things for their own purposes, as they likely should to conform to norms/convenience considerations in their own field. But if what we want is interoperability, I think keeping it as generic as reasonable is the way to go.

And calling back to the initial discussions we had when scoping out AtomsBase, I would strongly advocate for just building something that we will inevitably (at least partially) tear down and rebuild; but it's so much easier to have these discussions around actual code than around the hypothetical idea of code that it's not worth the analysis paralysis of worrying whether we have it "just right" before we write it. I'll spoil the ending – we won't have it just right, but we won't be able to know what that even is until we really stress-test it.

cortner commented 8 months ago

analysis paralysis

100% agreed.

singularitti commented 6 months ago

A small issue: why calculate(Forces(), ...) and not calculate(Forces, ...) ? I find the extra () quite ugly - extra noise I don't want to see.

@cortner I've considered this question before, and the suggestion I got from Julia's documentation is that instances are generally preferable since you can always add more internal parameters in the future:

The preferred style is to use instances by default, and only add methods involving Type{MyType} later if they become necessary to solve some problems.

A good example is that Roots.jl dispatches on instances.

cortner commented 6 months ago

That's a fair point and I'm ok with that.

cortner commented 1 month ago

I'm closing this, since this is now all more or less addressed in ongoing work in AtomsCalculators.

JuliaMolSim / AtomsBase.jl

Energies, Forces, Etc #84