Accuracy problem on GPU platform

dirypan commented 10 months ago

Hi, I found that with some integrators (say DifferentialEquations.Tsit5() ), changing dt will lead to different results, while the problem is not seen with the CPU platform. Here is the code that should reproduce the problem.

model_test_2D2D = ABM(2,
agent = Dict(
    :R => Float32,),
model = Dict(
    :DL => Float32,
    :S_radius => Float32,
    :S_duration => Float32,
    :S_on => Float32,
    :aR => Float32,
    :rR => Float32
),
medium = Dict(
    :L => Float32,
),

agentODE = quote
    dt(R) = aR*L - rR*R
end,

mediumODE = quote
    if @mediumInside()
        # if (xₘ^2 + yₘ^2)<S_radius^2 && (t < S_duration)
        #     LTB += S_on/(dx*dy)
        # end
        dt(L) = DL*(@∂2(1,L)+@∂2(2,L))
        # dt(LTB) = DLB*(@∂2(1,LTB)+@∂2(2,LTB)) - rLTB*LTB
    elseif @mediumBorder(1,-1)
        L = L[2,i2_]
    elseif @mediumBorder(1,1)
        L = L[NMedium[1]-1,i2_]
    elseif @mediumBorder(2,-1)
        L = L[i1_,2]
    elseif @mediumBorder(2,1)
        L = L[i1_,NMedium[2]-1]
    end
end,

mediumRule = quote
    if @mediumInside()
        if (xₘ^2 + yₘ^2)<S_radius^2 && (t < S_duration)
            L += S_on*(dt/(dx*dy))
        end
    end
end,
platform=GPU(),
agentAlg = DifferentialEquations.Tsit5(),
mediumAlg = DifferentialEquations.Tsit5(),
);

min_x = -3
max_x = 3
paddel_x = 0.05
min_y = -3
max_y = 3
paddel_y = 0.05
min_z = -0.3
max_z = 0.3
Ncell_x = 120 
Ncell_y = 120
com = Community(
model_test_2D2D,
N=Ncell_x*Ncell_y,
dt=0.02,
simBox=[min_x max_x;min_y max_y],
NMedium=[600,600],
)
com.DL = 0.1
com.aR = 0.4
com.rR = 1.
com.S_on = 0.1
com.S_duration = 1.
com.S_radius = 0.2
x = range(com.simBox[1,1]+paddel_x,com.simBox[1,2]-paddel_x,length=Ncell_x)
y = range(com.simBox[2,1]+paddel_y,com.simBox[2,2]-paddel_y,length=Ncell_y)
com.x = repeat(x,inner=[Ncell_y])
com.y = repeat(y,outer=[Ncell_x]);

evolve!(com,steps=200,saveEach=5,)

d = getParameter(com,[:x,:y,:R,:L,:t])
maximum(d[:L][t_idx][300,:])

The above code will output 275.75217f0 for CPU platform no matter what dt is used (change steps and saveEach accordingly) and will output 267.54324f0 for GPU platform with dt = 0.02 and output 275.75208f0 when dt = 0.005.

Since both the CPU and GPU are using Float32, and this problem is not stiff. I wonder what could be the root of the problem. (I tried on A100 and 3090ti, and the same problem exists). If it is related to the default parameters of the integrators (say abstol/reltol), how can I modify them in the algorithm assignment?

gatocor commented 10 months ago

You should be able to put them in the ***SolveArgs argument of ABM.

https://dsb-lab.github.io/CellBasedModels.jl/stable/API.html#CellBasedModels.ABM

gatocor commented 10 months ago

Anyway, as far I understand, it is normal for GPUs to have such errors as they do not have the same error correction methods as CPUs. In Quadro ones, they have better error proof implemented.

dirypan commented 10 months ago

Would this worry you to say the simulation results might be unreliable? My simulation is too costly on the CPU.

gatocor commented 3 weeks ago

Sorry @dirypan, I would not worry much, it will always depend on the accuracy you need.

Normal GPUs work with float32, while CPU on float64. If you need the most extreme precission, then I will go to GPU-Quadro or try to go to CPU parallelizations in a server/supercomputer.

dsb-lab / CellBasedModels.jl

Accuracy problem on GPU platform #48