JuliaMolSim / AtomsBase.jl

A Julian abstract interface for atomic structures.
https://juliamolsim.github.io/AtomsBase.jl/
MIT License
84 stars 18 forks source link

WIP: Towards Type Stability #101

Closed cortner closed 3 months ago

cortner commented 6 months ago

This PR is to address a number of itches I have with the current implementation of the interface and its implementation. So far the goal is to keep it entirely backward compatible. I think this may be possible. I believe I am still very close to backward compatible in the interface, though no longer in the internals.

The plan is to address the following issues:

This is now ready for comments.

Bugs and TODOs

cortner commented 6 months ago

Recording changes

More details:

cortner commented 6 months ago

Question for the community: If I define a system that has open bc in some or all directions (i.e. is infinite in those directions) then the cell-vector pointing in that direction technically doesn't really specify a bounding box anymore. Should the call to bounding box in that case compute the actual extent of the system as specified by the positions of the particles?

My intuition is that this is something that the neighbourlist should do and not the system. I would consider returning the cell vector or NaN or Inf in those cases. But I do not have a strong view yet.

cortner commented 6 months ago

@mfherbst Do you mind explaining this test to me:

@testset "Chemical formula with system" begin
    lattice     = [12u"bohr" * rand(3) for _ in 1:3]
    atoms       = [Atom(6, randn(3)u"Å"; atomic_symbol=:C1),
                   Atom(6, randn(3)u"Å"; atomic_symbol=:C2),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                  ]
    system      = periodic_system(atoms, lattice)
    @test atomic_symbol(system) == [:C1, :C2, :D, :D, :D]
    @test element_symbol(system) == [:C, :C, :H, :H, :H]
    @test chemical_formula(system) == "C₂H₃"
end

I take it the point is to test use of isotopes. I thought we had agreed this needed to be done differently. But I may well misremember that. This seems to be the only thing that prevents me from completing this PR.

Should there be another particle type Isotope?

struct Isotope
   atomic_number::UInt8
   nneutron::Int
end

How many people in this community care about isotopes?

EDIT: I think there are two alternative solutions:

cortner commented 6 months ago

My last question for now: Are the printing tests really needed? Why would we expect the ASCII output to remain unchanged when we make changes to the code?

tjjarvinen commented 6 months ago

I would add isotopes to ChemicalElement.

Other issue that is related to isotopes is that in some cases you want to flag atoms to be different. E.g. you could say that these atoms are can be transformed to each other with symmetry operations and you then flag them with somehow. Another option is to flag atoms based on different chemical environment.

One way to add this would add extra field to chemical element. Albeit it could be moved to somewhere else too.

stuct ChemicalElement
   Z::UInt16
   N::UInt16 
   extra::UInt32
end

Note, I prefer using number of nucleons over number of neutrons, as then number 0 could be interpret, as field ignored. While with number of neutrons this is not possible.

The total bit size of the structure should be 64, 32, 16 or 128. For 64-bit system the structure is machine level will always be multiple of 64, so that size would be good target.

cortner commented 6 months ago

I am not too keen to make ChemicalElement too rich - that goes against the idea of a simple prototype implementation. We could have an extended chemical element type. Anyway, I'm happy to go with whatever the majority thinks is useful.

I'm not too sure the size matters too much at this stage, since it will normally be paired with positions, velocities, mass, etc. ... but it's an interesting suggestion for performance tweaks down the road?

tjjarvinen commented 6 months ago

Cache line is 64 bytes. So, 3*Float64 for positions, mass is Float64, velosity is another 3*Float64. If on top of this chemical element is 64-bits then it totals 64 bytes. Meaning that it would very efficient to move in memory. I would definitely go for this.

We could simply implement

 stuct ChemicalElement
   Z::UInt64
end

stuct ChemicalElementExt
   Z::UInt16
   N::UInt16 
   extra::UInt32
end

With these you can reinterpret the simple ChemicalElement to ChemicalElementExt that has option to have the extra features. Thus we have the default option to go for the simple ChemicalElement. While still have the option to define isotopes and additional features.

cortner commented 6 months ago

I understood the idea. I think it is interesting and I'm happy to be convinced that this is the way to go if enough people agree ... I doubt though that this will be the bottleneck for many interesting applications. Ie anything beyond pair interactions...

mfherbst commented 4 months ago

I take it the point is to test use of isotopes.

Yes and the fact that some input files actually do not store atom types not just based on the chemical symbols, but add numbers in the end, e.g. in cif files you frequently have C1, C2 etc. instead of just C, which not necessarily means that the carbons are different isotopes. This needs to be somehow dealt with (e.g. by separating element from atomic symbol).

Other issue that is related to isotopes is that in some cases you want to flag atoms to be different. E.g. you could say that these atoms are can be transformed to each other with symmetry operations and you then flag them with somehow. Another option is to flag atoms based on different chemical environment.

Yes that is exactly the case that we need to cover. The isotopes are not so relevant indeed.

In that sense the most simple implementation would be perhaps

struct c
   Z::UInt16
   N::UInt16
   id::UInt32
end

or even without the N property ?

My idea here is that the id is used to render otherwise identical objects different. What distinguish them is sort of out of scope at this simplest interface and is up to the implementation (e.g. by having a Dict{ChemicalElement,Any} to store additional information ?)

Are the printing tests really needed?

Not necessarily to test the precise display from my point of view but to ensure the code runs and does not crash. Crashing code on printing is super annoying and should not happen.

cortner commented 4 months ago

In view of discussions in other channels I think this PR will likely be closed unmerged and some of the points raised here brought back after the interface update.

cortner commented 3 months ago

closing since #107 is now at about the same stage as this old PR