JuliaMolSim / AtomsBase.jl

A Julian abstract interface for atomic structures.
https://juliamolsim.github.io/AtomsBase.jl/
MIT License
84 stars 18 forks source link

canonical implementation #123

Open rkurchin opened 3 weeks ago

rkurchin commented 3 weeks ago

A major discussion point at the workshop yesterday was the idea of having an "official" or "canonical" implementation of the AtomsBase interface (somewhat analogous to ASE.Atoms in Python). Advantages would include:

Some logistical points to discuss on this:

Implementation discussion points:

Going to explicitly ping @cortner, @jgreener64, and @jameskermode here as they either very actively participated in discussion or implemented something that came up in said discussion :) but anyone should chime in with thoughts!

jgreener64 commented 3 weeks ago

I'm pro a canonical implementation that covers most use cases. I would only want it to be in the interface repo if it had negligible include time though. I don't see the harm in it being in a separate package.

The challenge would be agreeing on some minimal required properties, and yes I think there needs to be some way to store arbitrary data. But there needs to be at least some required properties, otherwise we end up defining an empty struct.

mfherbst commented 3 weeks ago

I also think a canonical implementation outside AtomsBase would be good. For me important would be:

My feeling is that convenience (e.g. mutability, annotation of bond properties etc.) and efficiency/suitability for AD might be contradicting. So maybe we should in the long run even have two implementations, one flexible, one fast ? That makes it easy to convert to/from these things whenever coding generic algorithms like GeometryOptimization does ? Would of course be good if this fragmentation can be avoided.

jameskermode commented 3 weeks ago

+1 for arbitrary properties. This has been critical to the success of ase.Atoms.

cortner commented 3 weeks ago

Apparently Ask is not that happy with it :)

lmiq commented 17 hours ago

I'm somewhat late for the discussion, but maybe I have something to add. I have been recently implementing the reading o mmCIF files in PDBTools.jl because I needed to store and read data for simulation files of 70M atoms. For that, apart from having to optimize the reading of the file, I had to remove a custom dictionary from my Atom data structure, because itself was too large to be stored in memory for every atom, even if empty:

julia> d = Dict{Symbol,Any}()
Dict{Symbol, Any}()

julia> Base.summarysize(d) * 70 * 10^6 / 1024 / 1024
30441.2841796875

(that would be 30GB just to store empty dicts for custom properties).

The result was that my Atom type, which previously always contained an empty Dict, had to be become a parametric Atom{CustomType} which defaults to Atom{Nothing} where the custom field of the struct is instantiated to nothing. With that I can hold in memory the data for the 70M atoms.

As a side note, I kept my Atom implementation mutable.