JuliaMolSim / Molly.jl

Molecular simulation in Julia
Other
372 stars 51 forks source link

make neighbor list search with CellListMap.jl non-allocating #41

Closed lmiq closed 2 years ago

lmiq commented 2 years ago

@jgreener64

I am just opening the pull request so that the changes are easier to visualize. Needs polishing and fixing of the Int type for the tests to pass.

I have changed a little bit the implementation of the neighborlist structure to update in place the lists and cell lists from step to step in the simulations. This should bring some performance benefits (and much less memory usage).

Most tests are passing, but some are failing when you change from Int64 to Int32 integer types (I'm unsure where is this being defined, and it does not seem to be ready to be propagated to the type of neighbour list.

Otherwise the tests seem to be passing. If you want to take a look at this, and see if in the cases where it runs (Int64) the performance is improved, I can polish what is missing to make everything to work.

The current approach (in which NeighbourList is a structure that contains both the number of pairs and an array of pairs that does not need to be replaced each time) can be used for the other approaches of computing neighbor lists, with the same advantages.

jgreener64 commented 2 years ago

Thanks for doing this, updating the neighbour lists as required is a definite algorithmic improvement.

About the failing Float32 (not Int32) tests, adding the type to line 192 of neighbors.jl seems to fix it: Box([10*one(T),10*one(T),10*one(T)],one(T); T=T). Although this box is replaced, I guess later values are promoted to its type. Tests then pass up to the error shown below.

I ran the following benchmark script, which should be portable:

using Molly
using DelimitedFiles

data_dir = normpath(dirname(pathof(Molly)), "..", "data")
ff_dir = joinpath(data_dir, "force_fields")
openmm_dir = joinpath(data_dir, "openmm_6mrr")

ff = OpenMMForceField(joinpath.(ff_dir, ["ff99SBildn.xml", "tip3p_standard.xml", "his.xml"])...)

atoms, atoms_data, specific_inter_lists, general_inters, neighbor_finder, coords, box_size = setupsystem(
    joinpath(data_dir, "6mrr_equil.pdb"), ff)

n_steps = 500
timestep = 0.0005u"ps"
temp = 300.0u"K"
velocities = SVector{3}.(eachrow(readdlm(joinpath(openmm_dir, "velocities_300K.txt"))))u"nm * ps^-1"

s = Simulation(
    simulator=VelocityVerlet(),
    atoms=atoms,
    atoms_data=atoms_data,
    specific_inter_lists=specific_inter_lists,
    general_inters=general_inters,
    coords=coords,
    velocities=velocities,
    temperature=temp,
    box_size=box_size,
    neighbor_finder=neighbor_finder,
    thermostat=AndersenThermostat(1.0u"ps"),
    timestep=timestep,
    n_steps=n_steps,
)

parallel = false
simulate!(s, 5; parallel=parallel)
@time simulate!(s, n_steps; parallel=parallel)

parallel = true
simulate!(s, 5; parallel=parallel)
@time simulate!(s, n_steps; parallel=parallel)

With the current Molly I get 120 s on 1 thread and 28 s on 16 threads. With the new change I get 18 mins on 1 thread and an error on 16 threads:

ERROR: LoadError: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:322 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:97 [inlined]
 [4] UpdateCellList!(x::Vector{SVector{3, Float64}}, box::CellListMap.Box{CellListMap.OrthorhombicCell, 3, Float64, 9}, cl::CellListMap.CellList{3, Float64}, aux::CellListMap.AuxThreaded{3, Float64}; parallel::Bool)
   @ CellListMap ~/.julia/packages/CellListMap/dfbQh/src/CellLists.jl:617
 [5] find_neighbors!(s::Simulation{false, Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}, Vector{Atom{Quantity{Float64, 𝐈 𝐓, Unitful.FreeUnits{(C,), 𝐈 𝐓, nothing}}, Quantity{Float64, 𝐌, Unitful.FreeUnits{(u,), 𝐌, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{AtomData}, Vector{SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}}, Vector{SVector{3, Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(nm, ps^-1), 𝐋 𝐓^-1, nothing}}}}, Tuple{LennardJones{false, DistanceCutoff{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2, Unitful.FreeUnits{(nm^2,), 𝐋^2, nothing}}, Quantity{Float64, 𝐋^-2, Unitful.FreeUnits{(nm^-2,), 𝐋^-2, nothing}}}, Float64, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, CoulombReactionField{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Float64, Float64, Quantity{Float64, 𝐋^3 𝐌 𝐍^-1 𝐈^-2 𝐓^-4, Unitful.FreeUnits{(C^-2, kJ, nm, mol^-1), 𝐋^3 𝐌 𝐍^-1 𝐈^-2 𝐓^-4, nothing}}, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}, Quantity{Float64, 𝐋^2, Unitful.FreeUnits{(nm^2,), 𝐋^2, nothing}}, Quantity{Float64, 𝐋^-3, Unitful.FreeUnits{(nm^-3,), 𝐋^-3, nothing}}, Quantity{Float64, 𝐋^-1, Unitful.FreeUnits{(nm^-1,), 𝐋^-1, nothing}}}}, Tuple{Vector{HarmonicBond{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, nm^-2, mol^-1), 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{HarmonicAngle{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{PeriodicTorsion{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{PeriodicTorsion{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}}, SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}, Quantity{Float64, 𝐓, Unitful.FreeUnits{(ps,), 𝐓, nothing}}, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, nf::CellListMapNeighborFinder{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, 3, Float64}, step_n::Int64; parallel::Bool)
   @ Molly ~/.julia/dev/Molly/src/neighbors.jl:257
 [6] simulate!(s::Simulation{false, Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}, Vector{Atom{Quantity{Float64, 𝐈 𝐓, Unitful.FreeUnits{(C,), 𝐈 𝐓, nothing}}, Quantity{Float64, 𝐌, Unitful.FreeUnits{(u,), 𝐌, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{AtomData}, Vector{SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}}, Vector{SVector{3, Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(nm, ps^-1), 𝐋 𝐓^-1, nothing}}}}, Tuple{LennardJones{false, DistanceCutoff{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2, Unitful.FreeUnits{(nm^2,), 𝐋^2, nothing}}, Quantity{Float64, 𝐋^-2, Unitful.FreeUnits{(nm^-2,), 𝐋^-2, nothing}}}, Float64, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, CoulombReactionField{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Float64, Float64, Quantity{Float64, 𝐋^3 𝐌 𝐍^-1 𝐈^-2 𝐓^-4, Unitful.FreeUnits{(C^-2, kJ, nm, mol^-1), 𝐋^3 𝐌 𝐍^-1 𝐈^-2 𝐓^-4, nothing}}, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}, Quantity{Float64, 𝐋^2, Unitful.FreeUnits{(nm^2,), 𝐋^2, nothing}}, Quantity{Float64, 𝐋^-3, Unitful.FreeUnits{(nm^-3,), 𝐋^-3, nothing}}, Quantity{Float64, 𝐋^-1, Unitful.FreeUnits{(nm^-1,), 𝐋^-1, nothing}}}}, Tuple{Vector{HarmonicBond{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, nm^-2, mol^-1), 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{HarmonicAngle{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{PeriodicTorsion{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{PeriodicTorsion{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}}, SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}, Quantity{Float64, 𝐓, Unitful.FreeUnits{(ps,), 𝐓, nothing}}, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, ::VelocityVerlet, n_steps::Int64; parallel::Bool)
   @ Molly ~/.julia/dev/Molly/src/simulators.jl:41
 [7] simulate!(s::Simulation{false, Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}, Vector{Atom{Quantity{Float64, 𝐈 𝐓, Unitful.FreeUnits{(C,), 𝐈 𝐓, nothing}}, Quantity{Float64, 𝐌, Unitful.FreeUnits{(u,), 𝐌, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{AtomData}, Vector{SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}}, Vector{SVector{3, Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(nm, ps^-1), 𝐋 𝐓^-1, nothing}}}}, Tuple{LennardJones{false, DistanceCutoff{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2, Unitful.FreeUnits{(nm^2,), 𝐋^2, nothing}}, Quantity{Float64, 𝐋^-2, Unitful.FreeUnits{(nm^-2,), 𝐋^-2, nothing}}}, Float64, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, CoulombReactionField{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Float64, Float64, Quantity{Float64, 𝐋^3 𝐌 𝐍^-1 𝐈^-2 𝐓^-4, Unitful.FreeUnits{(C^-2, kJ, nm, mol^-1), 𝐋^3 𝐌 𝐍^-1 𝐈^-2 𝐓^-4, nothing}}, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}, Quantity{Float64, 𝐋^2, Unitful.FreeUnits{(nm^2,), 𝐋^2, nothing}}, Quantity{Float64, 𝐋^-3, Unitful.FreeUnits{(nm^-3,), 𝐋^-3, nothing}}, Quantity{Float64, 𝐋^-1, Unitful.FreeUnits{(nm^-1,), 𝐋^-1, nothing}}}}, Tuple{Vector{HarmonicBond{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, nm^-2, mol^-1), 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{HarmonicAngle{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{PeriodicTorsion{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}, Vector{PeriodicTorsion{Float64, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}}}}, SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}, Quantity{Float64, 𝐓, Unitful.FreeUnits{(ps,), 𝐓, nothing}}, Unitful.FreeUnits{(kJ, nm^-1, mol^-1), 𝐋 𝐌 𝐍^-1 𝐓^-2, nothing}, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, n_steps::Int64; parallel::Bool)
   @ Molly ~/.julia/dev/Molly/src/simulators.jl:153
 [8] top-level scope
   @ ./timing.jl:210

    nested task error: BoundsError: attempt to access 1-element Vector{CellListMap.CellList{3, Float64}} at index [2]
    Stacktrace:
     [1] getindex
       @ ./array.jl:835 [inlined]
     [2] macro expansion
       @ ~/.julia/packages/CellListMap/dfbQh/src/CellLists.jl:618 [inlined]
     [3] (::CellListMap.var"#45#threadsfor_fun#59"{Vector{SVector{3, Float64}}, CellListMap.Box{CellListMap.OrthorhombicCell, 3, Float64, 9}, CellListMap.AuxThreaded{3, Float64}, UnitRange{Int64}})(onethread::Bool)
       @ CellListMap ./threadingconstructs.jl:85
     [4] (::CellListMap.var"#45#threadsfor_fun#59"{Vector{SVector{3, Float64}}, CellListMap.Box{CellListMap.OrthorhombicCell, 3, Float64, 9}, CellListMap.AuxThreaded{3, Float64}, UnitRange{Int64}})()
       @ CellListMap ./threadingconstructs.jl:52
in expression starting at /home/jgreener/.julia/dev/Molly/bench_cnl.jl:42

Any idea what is going on there?

lmiq commented 2 years ago

Thanks, the error on the threading I know what it is (a simple mistake of mine). I will investigate what went wrong with the performance, it should definitely improve (even if only a little) with those changes.

lmiq commented 2 years ago

Hello again.

  1. Of course the huge performance difference was because of type instabilities. I have added type parameters to the Simulation structure that annotate the types of neighbor_finder and neighbors. That solved the performance issue.

  2. The threading problem was that I was initializing the cell lists with a single particle in the CellListMapNeighborFinder constructor, and internally CellListMap will set the number of threads to be used to 1 when there are a few particles. Now I use the size of the nb_array there to get the actual number of particles.

  3. About performance: Since you are updating the lists only once at every 10 steps, not much happened concerning the time of this version relative to the previous one. However, the allocations were reduced, and this can be hugely beneficial if the lists are updated more frequently (i. e. n_steps=1) or for longer runs. Here is one example with n_steps=1:

With 8 threads in my 4-core laptop:

Previous:

julia> @time simulate!(s, 500; parallel=true)
Progress: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| Time: 0:01:26
 86.843611 seconds (210.55 M allocations: 118.804 GiB, 3.55% gc time)
15954-element Vector{SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}}:
 [2.6112604176052434 nm, 3.9260074065643074 nm, 4.798127029064249 nm]
 [2.5522208246513327 nm, 3.9355161036807775 nm, 4.880192723178161 nm]

This one:

julia> @time simulate!(s, 500; parallel=true)
Progress: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| Time: 0:01:24
 84.630563 seconds (205.31 M allocations: 16.763 GiB, 1.38% gc time)
15954-element Vector{SVector{3, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}}:
 [2.5146909893906777 nm, 3.8768142444584583 nm, 4.75519458433594 nm]
 [2.492187131673443 nm, 3.8767121593322265 nm, 4.852794872443657 nm]

Note that the allocations went down from 118GiB to 16GiB, although the effect of this on the computer time was not great. With more threads I think the different will be even greater.

(probably it is a good idea to work on these ~200M allocations there).

  1. The new neighbors structure, of type NeighborList is a wrapper on the list that allows the list to be updated without having to empty it every time. We keep the list, and emptying it is just setting the number of elements of the list to 0. This is necessary to update the lists in place, and by overloading the empty!, push! and append! functions, this will work as well for the other list constructors that you have implemented there (Julia is so cool). Thus, I expect that you get some allocation improvements for the other list constructors as well, even if their code didn't change.

  2. I've added a Base.show for NeighborFinder, because the show of CellListMapNeighborFinder is way too crowded:

julia> neighbor_finder = DistanceNeighborFinder(nb_matrix=s.neighbor_finder.nb_matrix,matrix_14=s.neighbor_finder.matrix_14, n_steps=1,                                                              dist_cutoff=1.2u"nm")
DistanceNeighborFinder{Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}}
  Size of nb_matrix = (15954, 15954)
  n_steps = 1
  dist_cutoff = 1.2 nm
  1. To make the neighbor search completely non-allocating when using CellListMap, I had to get read of the conversion that removed the units (ustrip.(coordinates), something like that). To do so, I have implemented a new feature on CellListMap by which you can overload a strip_value function internal to CellListMap and use the coordinates as they are, without that previous conversion to standard floats. This is available in version 0.5.24 which is now the latest. In your package I've added this:
# Add method to strip_value from cell list map to pass the 
# coordinates with units without having to reallocate the vector 
# requires CellListMap >= 0.54 
CellListMap.strip_value(x::Unitful.Quantity) = Unitful.ustrip(x)

which will take care of the stripping internally without having to copy the array of coordinates.

Seems that all tests are passing.

lmiq commented 2 years ago

Just updated a small interface change to the strip function of CellListMap (requires CellListMap 0.6, which I just released. I've updated the compat entry here for that)

jgreener64 commented 2 years ago

Great, I'll review on Monday but it basically looks good to go.

As you say a more efficient neighbour list lets you update every step, allowing the NL distance cutoff to be equal to the non-bonded interaction cutoff and saving on interaction calculations. For example a sphere of radius 1.0 nm has 58% of the volume of a sphere of radius 1.2 nm. I can test what speedup that gives.

lmiq commented 2 years ago

Indeed, the shorter the cutoff the better (that part is what scales with n^2). A small test here, using 1.0 nm and n_step=1, gives me 53 seconds (vs. 86 with 1.2nm).

With 1.0nm and n_step=2 I get 40s, which is the same I get with 1.2nm and n_step=10.

IMO we should track the displacements and update only if necessary (that is O(n)).

(I've updated the initialization of the cell lists, with a patch to "recover" the side without having it as a parameter of the constructor. Ideally one would initialize it with the actual coordinates and sides).

codecov[bot] commented 2 years ago

Codecov Report

Merging #41 (891d9bf) into master (e11bd4b) will decrease coverage by 0.37%. The diff coverage is 74.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
- Coverage   85.73%   85.35%   -0.38%     
==========================================
  Files          22       22              
  Lines        1283     1325      +42     
==========================================
+ Hits         1100     1131      +31     
- Misses        183      194      +11     
Impacted Files Coverage Ξ”
src/types.jl 84.61% <64.70%> (-11.22%) :arrow_down:
src/forces.jl 87.93% <66.66%> (ΓΈ)
src/neighbors.jl 81.44% <76.92%> (-1.42%) :arrow_down:
src/analysis.jl 100.00% <100.00%> (ΓΈ)
src/setup.jl 94.78% <100.00%> (ΓΈ)

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update e11bd4b...891d9bf. Read the comment docs.

lmiq commented 2 years ago

I followed the above comments as much as possible. Let me know if you think something else is required.

jgreener64 commented 2 years ago

Thanks for this.