JuliaMolSim / AtomsBase.jl

A Julian abstract interface for atomic structures.
https://juliamolsim.github.io/AtomsBase.jl/
MIT License
81 stars 16 forks source link

Consider extending `==` on `Atom`, `FlexibleSystem` with `StructEquality.jl`? #89

Open singularitti opened 8 months ago

singularitti commented 8 months ago

I was adding support to this package in my code, where in the future, the conversions between FlexibleSystem and my Cell types used in CrystallographyBase.jl and Crystallography.jl is possible.

However, when I was testing this conversion, I found that Atom & FlexibleSystem contains mutable types like Dict, therefore their == are not automatically true since:

Value types are intended for compact, immutable objects. They are stored on the stack, passed by value, and the default hash and equality are based on the literal bits in memory. Record types are allocated on the heap, are passed by reference, and the default hash and equality are based on the pointer value (the data address). When you embed a record type in a value type, then the pointer to the record type becomes part of the value type, and so is included in equality and hash.

Which caused comparisons between these types feeling wierd:

julia> using WhyNotEqual

julia> a = Atom(:H, [0, 0, 1.0]u"bohr")
Atom(H, atomic_number = 1, atomic_mass = 1.008 u):
    position          : [0,0,1]u"a₀"

julia> b = Atom(:H, [0, 0, 1.0]u"bohr")
Atom(H, atomic_number = 1, atomic_mass = 1.008 u):
    position          : [0,0,1]u"a₀"

julia> a == b
false

julia> whynot(==, a, b)
DifferentButSameChildren: When applying `lens` to both objects, we get `obj1` and `obj2`.
obj1 and obj2 are different, but their children are all the same.
lens: identity
obj1: Atom(H,  [       0,        0,        1]u"a₀")
obj2: Atom(H,  [       0,        0,        1]u"a₀"

julia> box = [[10.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 10.0]]u"Å"  # Note the unit!;

julia> bc = [Periodic(), Periodic(), Periodic()];

julia> hydrogen = FlexibleSystem(
           [Atom(:H, [0, 0, 1.0]u"bohr"), Atom(:H, [0, 0, 3.0]u"bohr")], box, bc
       );

julia> hydrogen2 = FlexibleSystem(
           [Atom(:H, [0, 0, 1.0]u"bohr"), Atom(:H, [0, 0, 3.0]u"bohr")], box, bc
       );

julia> hydrogen == hydrogen2
false

julia> using WhyNotEqual

julia> whynot(==, hydrogen, hydrogen2)
DifferentButSameChildren: When applying `lens` to both objects, we get `obj1` and `obj2`.
obj1 and obj2 are different, but their children are all the same.
lens: (@optic _.particles[1])
obj1: Atom(H,  [       0,        0,        1]u"a₀")
obj2: Atom(H,  [       0,        0,        1]u"a₀")

As you can see, their fields are equal, but they are not equal.

This might be surprising when you want to compare them, and this is what I encountered earlier: I have to write more code to do simple tests.

One solution is to use StructEquality.@struct_hash_equal_isequal:

using StructEquality

@struct_hash_equal_isequal struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass}
    position::SVector{D, L}
    velocity::SVector{D, V}
    atomic_symbol::Symbol
    atomic_number::Int
    atomic_mass::M
    data::Dict{Symbol, Any}  # Store arbitrary data about the atom.
end

@struct_hash_equal_isequal struct FlexibleSystem{D,S,L<:Unitful.Length} <: AbstractSystem{D}
    particles::AbstractVector{S}
    bounding_box::SVector{D,SVector{D,L}}
    boundary_conditions::SVector{D,BoundaryCondition}
    data::Dict{Symbol,Any}  # Store arbitrary data about the atom.
end

Then the above tests will all be trues.

In fact, I use @struct_hash_equal_isequal in all my packages:

@struct_hash_equal_isequal struct SpglibCell{L,P,T,M} <: AbstractCell
    lattice::Lattice{L}
    positions::Vector{MVector{3,P}}
    atoms::Vector{T}
    magmoms::Vector{M}
end

Similar packages are:

rkurchin commented 8 months ago

I think this is a very reasonable suggestion. If other folks agree (e.g. @mfherbst, @cortner, @jgreener64), maybe you could make a PR to this effect?

cortner commented 8 months ago

I don't have a strong view. StructEquality looks useful but I have never used it myself so can't really comment how practical it is. I'm happy for this to go ahead .

singularitti commented 8 months ago

Hi @rkurchin, sure. I could make a PR to test this change.

mfherbst commented 8 months ago

Sounds like a good quick solution to get a solid behaviour.

One edge case I can think of where one should think a second of the desired behaviour is the case where two atoms/systems only differ by the extra attributes, which are stored, e.g. one atom might store some extra information (e.g. a tag or a property relevant only to an implementing library), which from the AtomsBase point of view is not relevant. Should that then compare equal ? The safe choice is probably no, but I can see cases where one might argue yes as well. Anyone has strong feelings about this?

jgreener64 commented 8 months ago

Sounds okay to me.

I would err on the side of safety, I would be surprised if two things were equal when something was different. The user shouldn't have to worry too much about which bits AtomsBase cares about.

cortner commented 8 months ago

@mfherbst Good point. We will store training data in structures. Not sure how else to do this. Can there be a data dict which is excluded in the == ?

mfherbst commented 8 months ago

Btw for testing equality in tests there is also the AtomsBaseTesting package with some utilities.

cortner commented 1 month ago

Could also ≈ be added in a reasonable way?

singularitti commented 1 month ago

StructEquality.jl provides @struct_isapprox, so I think yes?

cortner commented 1 month ago

I'd love to have this PR, I'm current hand-writing equality and ≈ tests.

singularitti commented 1 month ago

Sorry I was pretty busy before, but recently I think I will have time to do this.