JuliaMolSim / Molly.jl

Molecular simulation in Julia
Other
372 stars 51 forks source link

Implementation of temperature REMD #64

Closed JaydevSR closed 1 year ago

JaydevSR commented 2 years ago

Includes:

Ruibin-Liu commented 2 years ago

The current interface seems not friendly to implement other types of replica exchange schemes like pH replica exchange and general Hamiltonian replica exchange simulations.

JaydevSR commented 2 years ago

However I think the temperature data should be stored at the level of the simulator rather than at the level of the system, in line with the other thermostats. In this case that would mean removing references to temperature in these two structs and having a TemperatureREMD simulator or similar, which would contain temperature data in its struct along with the thermostat to use. What do you think?

I was thinking about this as well as we will need different "sub" simulators for each replica with different coupling temperature for temp-REMD, so this will be better. We still need to maintain common indices between the simulator replicas and the system replicas so that there is no ambiguity. I will look into simplifying this.

JaydevSR commented 2 years ago

I have added the supporting functions to the ReplicaSystem type in addition to the suggested edits. Many function for System and ReplicaSystem are identical. Should I use a union type for that to simplify the code? Next I am also working of a simulator type TemperatureREMD and a method of simulate! for that.

codecov[bot] commented 2 years ago

Codecov Report

Merging #64 (b82d1c7) into master (7cf63d6) will decrease coverage by 0.28%. The diff coverage is 81.75%.

@@            Coverage Diff             @@
##           master      #64      +/-   ##
==========================================
- Coverage   85.26%   84.98%   -0.29%     
==========================================
  Files          31       31              
  Lines        3190     3323     +133     
==========================================
+ Hits         2720     2824     +104     
- Misses        470      499      +29     
Impacted Files Coverage Δ
src/types.jl 76.83% <73.68%> (-3.87%) :arrow_down:
src/simulators.jl 96.49% <88.70%> (-2.91%) :arrow_down:
src/loggers.jl 95.97% <100.00%> (+0.28%) :arrow_up:
src/chain_rules.jl 46.80% <0.00%> (-5.32%) :arrow_down:
src/zygote.jl 77.84% <0.00%> (+0.29%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

jgreener64 commented 2 years ago

Good changes. Yes you can use a union type, possibly we could introduce an abstract type later.

I think the AtomsBase interface requires that the index given to position and velocity is the atom index, so the replica index plus atom index used here would be confusing. Possibly use the first replica for now in these functions?

JaydevSR commented 2 years ago

For making the simulation, I first define a simulation type:

struct TemperatureREMD{N, T, S, SD}
    temps::StaticVector{N, T}
    indices::StaticVector{N, Int}
    simulators::SD
end

A constructor is defined separately for this type that I am omitting here right now. Then a new methods for the simulate! function:

function simulate!(;
    sys::ReplicaSystem,
    sim::TemperatureREMD,
    n_cycles::Int,
    cycle_length::Int;
    parallel=true
    )

    if sys.n_replicas != length(sim.indices)
        error("Number of replicas in ReplicaSystem and simulators in TemperatureREMD must match.")
    end
    for cycle=1:n_cycles
        for idx in sim.indices
            simulate!(sys.replicas[idx], sim.simulators[idx], cycle_length; parallel=parallel)  #flag1
        end

        if cycle != n_cycles
            n = rand(1:sys.n_replicas)
            m = mod(n, sys.n_replicas) + 1
            β_n, β_m = 1/sim.temps[n], 1/sim.temps[m]
            V_n, V_m = potential_energy(sys.replicas[n]), potential_energy(sys.replicas[m])  #flag2
            Δ = (β_m - β_n)*(V_n - V_m)
            if Δ <= 0 || rand() < exp(-Δ)
                # exchange coordinates
                sys.replicas[n].coords, sys.replicas[m].coords = sys.replicas[m].coords, sys.replicas[n].coords
                # scale velocities
                sys.replicas[n].velocities *= sqrt(β_n/β_m)
                sys.replicas[m].velocities *= sqrt(β_m/β_n)
            end
        end
    end
end

Note: Right now the replica simulation is not in parallel.

The main issue here is that sys.replicas[i] has type IndividualReplica for which many of other function will need new methods, but if it had the type System then all of this will work without any problem and we don't need to make any changes to other functions (#flag1 and #flag2) or define new methods. But if we make all the replicas their own systems, then there is issue of redundant information in all of them. Any suggestions?

Reference for T-REMD algorithm that I am using: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6484850/

jgreener64 commented 2 years ago

The simulator outline looks good. In order to follow the simulate!(sys, sim, n_steps) convention it might be better to store an exchange_time in TemperatureREMD and calculate n_cycles and cycle_length in simulate!.

You raise an interesting point about types. I think the most flexible approach might just be to have multiple Systems. This opens up the possibility of setting up the systems differently too, which some people might find use for. Longer term each replica would need a different box_size anyway (barostats change the box size) and I think each replica should have its own neighbour finder since CellListMapNeighborFinder stores some neighbour info in the struct.

That leaves the atoms, atoms_data and interaction fields as duplicated, but since arrays are stored by reference each System can just point to the same memory for this. You could still have ReplicaSystem which just holds a list of Systems and does relevant setup in the constructor.

How does that sound?

jgreener64 commented 2 years ago

(Accidentally clicked close sorry)

The simulator outline looks good. In order to follow the simulate!(sys, sim, n_steps) convention it might be better to store an exchange_time in TemperatureREMD and calculate n_cycles and cycle_length in simulate!.

You raise an interesting point about types. I think the most flexible approach might just be to have multiple Systems. This opens up the possibility of setting up the systems differently too, which some people might find use for. Longer term each replica would need a different box_size anyway (barostats change the box size) and I think each replica should have its own neighbour finder since CellListMapNeighborFinder stores some neighbour info in the struct.

That leaves the atoms, atoms_data and interaction fields as duplicated, but since arrays are stored by reference each System can just point to the same memory for this. You could still have ReplicaSystem which just holds a list of Systems and does relevant setup in the constructor.

How does that sound?

JaydevSR commented 2 years ago

For the constructor, in order to support multiple sub simulators and couplings, I am having to write multiple checks:

function TemperatureREMD(;
    simulator, exchange_time, temps, dt, 
    kwargs...)
    S = simulator
    T = eltype(temps)
    N = length(temps)
    ET = typeof(exchange_time)
    temps = SA[temps...]
    indices = SA[collect(1:length(temps))...]

    coupling_required = false
    if simulator ∈ [VelocityVerlet, Verlet, StormerVerlet]
        coupling_required = true
        coupling = kwargs[:coupling]
    end

    if coupling_required && coupling ∈ [AndersenThermostat, BerendsenThermostat]
        couplings = [coupling(temps[i], kwargs[:coupling_const]) for i in indices]
    elseif coupling_required && coupling ∈ [RescaleThermostat]
        couplings = [coupling(temps[i]) for i in indices]
    elseif coupling_required
        error("Temperature coupling required for TemperatureREMD using $(simulator) simulator.")
    end

    if simulator==Langevin
        simulators = Dict(
            [i, simulator(dt=dt, temperature=temps[i], friction=kwargs[:friction],
            remove_CM_motion=kwargs[:remove_CM_motion])]
            for i in indices
        )
    elseif simulator ∈ [VelocityVerlet, Verlet]
        simulators = Dict(
            [i, simulator(dt=dt, temperature=temps[i], coupling=couplings[i],
            remove_CM_motion=kwargs[:remove_CM_motion])]
            for i in indices
        )
    elseif simulator == StormerVerlet
        simulators = Dict(
            [i, simulator(dt=dt, temperature=temps[i], coupling=couplings[i])]
            for i in indices
        )
    else
        error("Simulator type $(simulator) not supported.")
    end
    SD = typeof(simulators)

    return TemperatureREMD{N, T, S, SD, ET}(temps, indices, simulators, exchange_time)
end

But when new simulators are added, this function will also need to be changed. This does not seem good.

jgreener64 commented 2 years ago

I agree it doesn't scale well. This could just be left to the user for now, i.e. the constructor takes a list of simulators as an argument and checks that it is the same length as the list of temperatures. The docs would make clear that the simulators should be set up with the appropriate coupling/temperatures.

JaydevSR commented 2 years ago

I was looking a bit into how to parallelize the whole replica simulations and would like to make a few points about that:

  1. Right now the parallel=true versions of other functions use Threads.@threads macro, which is statically scheduled. So, If I use Threads.@threads for parallelizing replica simulation, then only one thread will be available for those functions.
  2. My suggestion is we use the @sync and @spawn for all the parallelization, this will change:
    Threads.@threads for i=1:n
        do_some_work()
    end

    to:

    @sync for i=1:n
        Threads.@spawn do_some_work()
    end
  3. Also, from julia version 1.8 onwards, the behavior of Threads.@thread macro is going to change. And it will be dynamically scheduled by default. See: https://github.com/JuliaLang/julia/blob/v1.8.0-rc1/NEWS.md#multi-threading-changes. So maybe these changes won't be necessary after the release of v1.8?
jgreener64 commented 2 years ago

I think target Julia 1.8 behaviour, it may be that we drop support for Julia 1.7 soon anyway depending on advances in the AD landscape. The parallel behaviour here doesn't have to mirror the existing behaviour though since it is more complex.

As you suggest it would be useful to be able to run each simulation on a certain number of threads, e.g. 16 cores could run 4 replicas on 4 cores each. Is this possible using the threads setup? The current parallel keyword argument may be insufficient here, I have considered changing it to n_threads across the whole package previously. Would that help?

jgreener64 commented 2 years ago

Also, sorry for that large breaking change recently. Hopefully there will be fewer breaking changes as the right API emerges!

JaydevSR commented 2 years ago

I have added the simulator for TemperatureREMD. Right now I am using the parallel kwarg but I agree with you that its usage in this case is not ideal.

The current @sync and Threads.@spawn method chooses a free thread to perform one cycle of the simulation and say if there are 4 replicas then 4 threads will be used leaving the other threads free for the other parallel calls. But it is not ensured that all the threads are distributed equally among all the replicas (which may be problematic if the number of nested parallel calls is close to the number of free threads due to which workload may not be evenly balanced).

About using using an n_threads argument, in that case we can make use of the Threads.foreach() function (see: https://docs.julialang.org/en/v1/base/multi-threading/#Base.Threads.foreach) which seems useful in our case.

Also, I performed some initial simulations (akin to the Leonard Jones gas example in docs) to check the sanity of the simulator and here are some results, for 4 replicas of the system at 100 K, 105 K, 110 K and 115 K:

  1. Temperature of each replica with steps temperature_of_each_replica
  2. X-coordinate of one of atoms with time for each replica (change in color denotes a replica exhange): x_coordiantes_of_atom_7_in_each_replica

Still I am a bit skeptical about the replica exchange part and I will check it once again to ensure that I am not doing something wrong.

jgreener64 commented 2 years ago

Yes I think let's switch to n_threads then, could you do that? It would involve replacing parallel=false with n_threads=1 and parallel=true with n_threads=default_n_threads throughout, where default_n_threads is defined alongside default_float as Threads.nthreads(). Then change the relevant logic in forces.

With regards to testing I would use a large temperature range, at least 50 K difference per replica, and have an exchange time slow enough to see temperature equilibration. You could use a Langevin integrator or a lower Andersen coupling constant if it is struggling to equilibrate.

JaydevSR commented 2 years ago

@jgreener64 Okay, I will try making these changes. Also thanks for suggestions on test simulations.

JaydevSR commented 2 years ago

It seems that making these changes using just the base library will be a bit more of a trouble than we need to handle. Instead JuliaFolds/FLoops.jl, already supports such a change in number of threads utilized. See: https://juliafolds.github.io/data-parallelism/howto/faq/#set-nthreads-at-run-time. This library will also support much more different flavors of thread based parallelization (via JuliaFolds/FoldsThreads.jl) which can come in handy for load balancing. Should I use this library if adding it as a dependency won't be a problem?

jgreener64 commented 2 years ago

Taking the dependency is fine so yes go ahead and try it out if it provides functionality beyond base Julia. Long term I think we will change parallel to n_threads anyway though.

I guess using FLoops.jl may also allow the CPU forces path https://github.com/JuliaMolSim/Molly.jl/blob/3bed491a9d183492262d5ec323a0f509923e51f8/src/force.jl#L221 to be written more concisely.

JaydevSR commented 2 years ago

Okay then I will experiment with a bit and make a PR if I am able to get it to work as expected.

JaydevSR commented 1 year ago

I ran the simulator on Leonard-Jones gas and it seem to work without any problems. What possible tests can I make that will better highlight the replica exchange at work that this basic simulation? After this I can write some unit tests to improve coverage.

jgreener64 commented 1 year ago

Great. I think a Lennard-Jones simulation is okay for now but it would be good to see the graphs you had previously with a clearer temperature profile.

It also would be useful to log when the exchanges took place, then compare the number of exchanges over a simulation to an expected number. Perhaps the ReplicaSystem could have a loggers field, with the default being a new ReplicaExchangeLogger that records the time step, the indices exchanged and Δ for each exchange.

On the topic of loggers, I think ReplicaSystem shouldn't store replica_loggers in its struct. replica_loggers should be used in the constructor as is, then the loggers can be accessed with replica_system.replicas[i].loggers.

JaydevSR commented 1 year ago

I performed a test simulation on 3 replicas of Leonard-Jones gas on 8 threads, using DistanceNeighborFinder and a Langevin simulator.

temp_vals = [120.0u"K", 160.0u"K", 200.0u"K"]
simulator = TemperatureREMD(
    dt=0.005u"ps",
    temperatures=temp_vals,
    simulators=[
        Langevin(
            dt=0.005u"ps",
            temperature=tmp,
            friction=0.1u"ps^-1",
        )
        for tmp in temp_vals],
    exchange_time=2.5u"ps",
);

The exchanges were logged with the help of the new ReplicaExchangeLogger and we have the following exchanges in form of (step, exchanged indices) after simulating for 20_000 steps:

 (9500, (2, 3))
 (12500, (1, 2))
 (15500, (1, 2))
 (17000, (3, 1))
 (18000, (2, 3))
 (18500, (2, 3))

These are plots for temperatures of each replica and the x-coordinates of the one of the atoms in these replicas:

  1. Temperatures: temperature_of_each_replica

  2. Coordinates: x_coordiantes_of_atom_7_in_each_replica

Edit: Changed the plots after fixing the velocity exchange

JaydevSR commented 1 year ago

Benchmarks

  1. System with 1000 atoms and DistanceNeighborFinder, LeonardJones interactions using Langevin simulator for 5000 steps (2 threads): 77.253948 seconds (582.38 k allocations: 4.966 GiB, 0.87% gc time)
  2. ReplicaSystem with 4 replicas of above system with same simulator and steps (8 threads): 173.750242 seconds (2.18 M allocations: 20.255 GiB, 2.47% gc time, 0.45% compilation time)

Ratio of times: 2.25

JaydevSR commented 1 year ago

I did profiling on the simulate! function and the bottleneck seems to be the part where the replicas are being exchanged. Most of the part there is type unstable. I will try to make it type stable and see if there are any improvements.

JaydevSR commented 1 year ago

It seems that I misinterpreted the profiles before. The actual issue with the drop in performance seems to do with task scheduling. Here are the flame graph of the profiles: overall_flame_graph find_neighbors_focus_flame_graph

In the second image we can see this more clearly, more than 60% of time is spent in the wait() in the call stack of find_neighbors!. This as far as I can think should be because of clashing scheduling of @floops and @spawn. I think I may be able to fix this by using one of the different executors for the parallel loop in find_neighbors!. See: https://juliafolds.github.io/FoldsThreads.jl/dev/#FoldsThreads.WorkStealingEx. I will see if this works.

JaydevSR commented 1 year ago

So I experimented a bit to see if there can be an improvement or not. After trying different executors and a pure julia implementation also, the case with 2 threads 1 system and 8 threads 4 replicas, gives no improvement. But for a 1 thread 1 system and 4 threads 4 replicas, the ratio was improved a bit to about 1.6, as my machine supports at most 8 threads so I think the wait time was high in the first case because all threads were getting used. So, maybe it is best to use less than the maximum available threads for the replica simulation to avoid such clashes.

But still julia v1.8 will bring some changes to multithreading, so it will be interesting to see if that gives any improvement for nested threaded loops.

jgreener64 commented 1 year ago

Nice one :tada:

Thinking about next steps, it might be worth considering how to extend this to non-temperature REMD simulators. The system type and the logger are already independent of temperature. Would it be possible to modify the simulate! function to take in a function that runs the lines

if dimension(sys.energy_units) == u"𝐋^2 * 𝐌 * 𝐍^-1 * 𝐓^-2"
    k_b = sys.k * T(Unitful.Na)
else
    k_b = sys.k
end
T_n, T_m = sim.temperatures[n], sim.temperatures[m]
β_n, β_m = 1/(k_b*T_n), 1/(k_b*T_m)
neighbors_n = find_neighbors(sys.replicas[n], sys.replicas[n].neighbor_finder; n_threads=n_threads)
neighbors_m = find_neighbors(sys.replicas[m], sys.replicas[m].neighbor_finder; n_threads=n_threads)
V_n, V_m = potential_energy(sys.replicas[n], neighbors_n), potential_energy(sys.replicas[m], neighbors_m)
Δ = (β_m - β_n)*(V_n - V_m)
should_exchange = Δ <= 0 || rand(rng) < exp(-Δ)
return should_exchange, Δ

and determines whether an exchange should take place? Then simulators such as TemperatureREMD can be a particular case of a generic REMD simulator that takes this function as one of its arguments, and the simulation code can be reused.

JaydevSR commented 1 year ago

Yay 🎊 .

Your suggestion seem excellent. This will help in generalizing the simulation code. I will go ahead and make these additions. Also, I think I can try implementing the Hamiltonian-REMD next, that would require the replicas to have different potentials. So, I will also need to make some changes to ReplicaSystem to allow for this.