canonical implementation

JuliaMolSim / AtomsBase.jl

A Julian abstract interface for atomic structures.

https://juliamolsim.github.io/AtomsBase.jl/

MIT License

84 stars 18 forks source link

canonical implementation #123

Open rkurchin opened 3 weeks ago

rkurchin commented 3 weeks ago

A major discussion point at the workshop yesterday was the idea of having an "official" or "canonical" implementation of the AtomsBase interface (somewhat analogous to ASE.Atoms in Python). Advantages would include:

easy entry point for new users to get started and a concrete type to use in generic documentation about the interface
can hopefully reduce some unnecessary duplication of functionality
avoid confusion about whether the currently included (and arguably not great) prototype implementations are actually meant to be used for things...they could presumably just be moved to documentation as examples and not actually included

Some logistical points to discuss on this:

should this live in this repository? In this repository, but not exported by default? In a different repository?
It was also mentioned that if it lives in a different repo, we might discuss renaming this one to AtomsInterface or similar instead

Implementation discussion points:

needs to be type-stable
how flexible to additional properties do we want to be? (fully arbitrary makes type stability difficult, but probably we want some such flexibility)
should it be mutable or immutable? Or should there be two versions?
Should its length be fixed (i.e. should atoms be able to be added/removed after instantiation)?
Should people be able to directly modify fields or only via setter methods (to enable preservation of any internal consistency between fields, a la issues Ask had mentioned in ASE)
things to build off of could include the versions in DecoratedParticles or ExtXYZ

Going to explicitly ping @cortner, @jgreener64, and @jameskermode here as they either very actively participated in discussion or implemented something that came up in said discussion :) but anyone should chime in with thoughts!

jgreener64 commented 3 weeks ago

I'm pro a canonical implementation that covers most use cases. I would only want it to be in the interface repo if it had negligible include time though. I don't see the harm in it being in a separate package.

The challenge would be agreeing on some minimal required properties, and yes I think there needs to be some way to store arbitrary data. But there needs to be at least some required properties, otherwise we end up defining an empty struct.

mfherbst commented 3 weeks ago

I also think a canonical implementation outside AtomsBase would be good. For me important would be:

Arbitrary properties
A good set of basic properties (e.g. like FlexibleSystem) I think DecoratedParticles might be a good start, actually.

My feeling is that convenience (e.g. mutability, annotation of bond properties etc.) and efficiency/suitability for AD might be contradicting. So maybe we should in the long run even have two implementations, one flexible, one fast ? That makes it easy to convert to/from these things whenever coding generic algorithms like GeometryOptimization does ? Would of course be good if this fragmentation can be avoided.

jameskermode commented 3 weeks ago

+1 for arbitrary properties. This has been critical to the success of ase.Atoms.

cortner commented 3 weeks ago

Apparently Ask is not that happy with it :)

lmiq commented 17 hours ago

I'm somewhat late for the discussion, but maybe I have something to add. I have been recently implementing the reading o mmCIF files in PDBTools.jl because I needed to store and read data for simulation files of 70M atoms. For that, apart from having to optimize the reading of the file, I had to remove a custom dictionary from my Atom data structure, because itself was too large to be stored in memory for every atom, even if empty:

julia> d = Dict{Symbol,Any}()
Dict{Symbol, Any}()

julia> Base.summarysize(d) * 70 * 10^6 / 1024 / 1024
30441.2841796875

(that would be 30GB just to store empty dicts for custom properties).

The result was that my Atom type, which previously always contained an empty Dict, had to be become a parametric Atom{CustomType} which defaults to Atom{Nothing} where the custom field of the struct is instantiated to nothing. With that I can hold in memory the data for the 70M atoms.

As a side note, I kept my Atom implementation mutable.