WIP: Towards Type Stability

cortner commented 6 months ago

This PR is to address a number of itches I have with the current implementation of the interface and its implementation. ~~So far the goal is to keep it entirely backward compatible. I think this may be possible.~~ I believe I am still very close to backward compatible in the interface, though no longer in the internals.

The plan is to address the following issues:

[x] #99
[x] #97
[x] #49
[x] #5
[x] some discussions elsewhere about having a type-stable and isbits atoms type.
[x] an ad-hoc decision to make FlexibleSystem mutable since it is supposed to be flexible? This will be good for AtomsBuilder
[x] #102

This is now ready for comments.

Bugs and TODOs

[ ] make all tests pass
[ ] write tests to show equivalent when using FastAtom
[ ] write tests that show performance when using FastAtom
[ ] cannot reinterpret ChemicalElement as UInt8
[ ] I would like to delete or deprecate DirichletZero
[ ] I would like to replace AtomView with FastAtom for fast_system[i]. The latter is "isbits" and hence performant. Also much much easier to use and implement, basically working like an SVector.

cortner commented 6 months ago

Recording changes

Introduce PCell type to encode a computational cell with pbc flags
Introduce SystemWithCell{D, TCELL} to help with dispatch
Adapt FlexibleSystem and FastSystem to accept PCell instead of bounding_box and boundary_conditions
Introduce FastAtom to be compatible with FastSystem
Made FlexibleSystem mutable.
make Bounding box and bc tuples
introduce OpenBC to replace DirichletZero

More details:

The P in PCell can stand for periodic or for "parallelepiped" (cc Rachel). I don't care really. But I didn't want to call it Cell. If somebody has a better name I will be happy to change it.
bounding box and bc should be tuples instead of vectors since (i) inferrable length and (ii) there should be no arithmetic performed on them.
The behaviour of isinfinite has changed, it now returns a D-tuple since the system can be infinite in some dimension but not others.

cortner commented 6 months ago

Question for the community: If I define a system that has open bc in some or all directions (i.e. is infinite in those directions) then the cell-vector pointing in that direction technically doesn't really specify a bounding box anymore. Should the call to bounding box in that case compute the actual extent of the system as specified by the positions of the particles?

My intuition is that this is something that the neighbourlist should do and not the system. I would consider returning the cell vector or NaN or Inf in those cases. But I do not have a strong view yet.

cortner commented 6 months ago

@mfherbst Do you mind explaining this test to me:

@testset "Chemical formula with system" begin
    lattice     = [12u"bohr" * rand(3) for _ in 1:3]
    atoms       = [Atom(6, randn(3)u"Å"; atomic_symbol=:C1),
                   Atom(6, randn(3)u"Å"; atomic_symbol=:C2),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                   Atom(1, randn(3)u"Å"; atomic_symbol=:D),
                  ]
    system      = periodic_system(atoms, lattice)
    @test atomic_symbol(system) == [:C1, :C2, :D, :D, :D]
    @test element_symbol(system) == [:C, :C, :H, :H, :H]
    @test chemical_formula(system) == "C₂H₃"
end

I take it the point is to test use of isotopes. I thought we had agreed this needed to be done differently. But I may well misremember that. This seems to be the only thing that prevents me from completing this PR.

Should there be another particle type Isotope?

struct Isotope
   atomic_number::UInt8
   nneutron::Int
end

How many people in this community care about isotopes?

EDIT: I think there are two alternative solutions:

one would be to simply add the nneutron field to ChemicalElement
the second would be to maintain a list of isotopes in chem.jl and give those isotopes ids starting with 1000. E.g. 1001 => D. Then the chemical element type gets modified to become
```
struct ChemicalElement
id::Int 
end 
```
with id no longer specifying atomic number but an index into the list of possible atom types. This isn't without danger. One would need to maintain permanent backward compatibility of ids I think and I probably prefer the isotope .

cortner commented 6 months ago

My last question for now: Are the printing tests really needed? Why would we expect the ASCII output to remain unchanged when we make changes to the code?

tjjarvinen commented 6 months ago

I would add isotopes to ChemicalElement.

Other issue that is related to isotopes is that in some cases you want to flag atoms to be different. E.g. you could say that these atoms are can be transformed to each other with symmetry operations and you then flag them with somehow. Another option is to flag atoms based on different chemical environment.

One way to add this would add extra field to chemical element. Albeit it could be moved to somewhere else too.

stuct ChemicalElement
   Z::UInt16
   N::UInt16 
   extra::UInt32
end

Note, I prefer using number of nucleons over number of neutrons, as then number 0 could be interpret, as field ignored. While with number of neutrons this is not possible.

The total bit size of the structure should be 64, 32, 16 or 128. For 64-bit system the structure is machine level will always be multiple of 64, so that size would be good target.

cortner commented 6 months ago

I am not too keen to make ChemicalElement too rich - that goes against the idea of a simple prototype implementation. We could have an extended chemical element type. Anyway, I'm happy to go with whatever the majority thinks is useful.

I'm not too sure the size matters too much at this stage, since it will normally be paired with positions, velocities, mass, etc. ... but it's an interesting suggestion for performance tweaks down the road?

tjjarvinen commented 6 months ago

Cache line is 64 bytes. So, 3*Float64 for positions, mass is Float64, velosity is another 3*Float64. If on top of this chemical element is 64-bits then it totals 64 bytes. Meaning that it would very efficient to move in memory. I would definitely go for this.

We could simply implement

 stuct ChemicalElement
   Z::UInt64
end

stuct ChemicalElementExt
   Z::UInt16
   N::UInt16 
   extra::UInt32
end

With these you can reinterpret the simple ChemicalElement to ChemicalElementExt that has option to have the extra features. Thus we have the default option to go for the simple ChemicalElement. While still have the option to define isotopes and additional features.

cortner commented 6 months ago

I understood the idea. I think it is interesting and I'm happy to be convinced that this is the way to go if enough people agree ... I doubt though that this will be the bottleneck for many interesting applications. Ie anything beyond pair interactions...

mfherbst commented 4 months ago

I take it the point is to test use of isotopes.

Yes and the fact that some input files actually do not store atom types not just based on the chemical symbols, but add numbers in the end, e.g. in cif files you frequently have C1, C2 etc. instead of just C, which not necessarily means that the carbons are different isotopes. This needs to be somehow dealt with (e.g. by separating element from atomic symbol).

Other issue that is related to isotopes is that in some cases you want to flag atoms to be different. E.g. you could say that these atoms are can be transformed to each other with symmetry operations and you then flag them with somehow. Another option is to flag atoms based on different chemical environment.

Yes that is exactly the case that we need to cover. The isotopes are not so relevant indeed.

In that sense the most simple implementation would be perhaps

struct c
   Z::UInt16
   N::UInt16
   id::UInt32
end

or even without the N property ?

My idea here is that the id is used to render otherwise identical objects different. What distinguish them is sort of out of scope at this simplest interface and is up to the implementation (e.g. by having a Dict{ChemicalElement,Any} to store additional information ?)

Are the printing tests really needed?

Not necessarily to test the precise display from my point of view but to ensure the code runs and does not crash. Crashing code on printing is super annoying and should not happen.

cortner commented 4 months ago

In view of discussions in other channels I think this PR will likely be closed unmerged and some of the points raised here brought back after the interface update.

cortner commented 3 months ago

closing since #107 is now at about the same stage as this old PR

JuliaMolSim / AtomsBase.jl

WIP: Towards Type Stability #101

Bugs and TODOs

Recording changes