Printing results changes results in CPU kernel

jlk9 commented 5 months ago

This function computes a numerical gradient, then sums up the results using a KA kernel run on the CPU:

function gradient_normSq(grad, mesh::HorzMesh; backend=KA.CPU())

    nEdges = size(grad)[2]
    vert_levels = 1

    # New modified kernel:
    kernel! = GradientOnEdgeModified(backend)
    kernel!(mesh.Edges.dcEdge, grad, workgroupsize=64, ndrange=(nEdges, vert_levels))

    KA.synchronize(backend)

    @show grad

    normSq = 0.0
    for i = 1:nEdges
        normSq += grad[i]^2
    end

    return normSq
end

Computing numerical derivatives of this function produces the following output:

┌ Info:  (gradients)
│ 
│ For edge global index 1837
│ Enzyme computed 8.464896000000272e6
└ Finite differences computed 8.46489600001701e6

where Enzyme is an AD tool, and the finite difference computation uses these values:

dnorm_dgrad_fd = (normSqP - normSqM) / (gradNumPk - gradNumMk)
normSqP = 2.5367019692426506e14
normSqM = 2.5366708692147466e14
gradNumPk = 2020.7
gradNumMk = 1653.3

this makes sense: finite differences and an AD tool produce very similar results. However, what if we comment out the @show statement in gradient_normSq? Then we get this result:

normSqP = 3.362040598830664e-11
normSqM = 3.362040356541349e-11
gradNumPk = 2020.7
gradNumMk = 1653.3

┌ Info:  (gradients)
│ 
│ For edge global index 1837
│ Enzyme computed 8.464896000000272e6
└ Finite differences computed 6.5947010218182634e-21

The AD tool still works fine, but the perturbed norm computations for FD are now wrong. Nothing was changed except for removing an @show statement.

The code producing this error can be found in https://github.com/jlk9/MPAS-Ocean.jl/tree/bug-reduce, in the file bug.jl. I tried reducing it further, but getting rid of the mesh objects from the MPAS-Ocean package or the enzyme call caused the bug to disappear. Those don't directly affect the results of normSqP and normSqM, so maybe it's an issue of timing and CPU kernels being asynchronous? I call @synchronize() at the end of GradientOnEdgeModified, though.

vchuravy commented 5 months ago

I call @synchronize() at the end of GradientOnEdgeModified, though.

This has no effect since you are not performing shared memory operations.

jlk9 commented 5 months ago

That's what I thought, still mentioned it for completeness. I guess the KA.synchronize(backend) call should synchronize global memory, but it currently does nothing looking in the cpu.jl file.

vchuravy commented 5 months ago

A yield() in the position of @show is sufficient.

vchuravy commented 5 months ago

I guess the KA.synchronize(backend) call should synchronize global memory, but it currently does nothing looking

All kernel launches are synchronized on the CPU. https://github.com/JuliaGPU/KernelAbstractions.jl/blob/4285051c3ea949aee8a024383f38c8849f602729/src/cpu.jl#L89-L90

vchuravy commented 5 months ago

This is very weird, when I comment out the call to Enzyme this spooky action from a distance disappears as well.

Also moving the FD calls before the Enzyme calls. So I would still suspect Enzyme doing something here that for some reason causes it to overlap with the FD code and corrupt some data...

Check if the Mesh is correct after Enzyme.

jlk9 commented 5 months ago

I'll check that! The one caveat with Enzyme being the culprit is, getting rid of the MOKA mesh component and replacing it with a stand-in struct (you can see that in the commented out portion of bug.jl) also removed the bug, even with Enzyme still there. That doesn't rule out the idea of Enzyme affecting the mesh though

vchuravy commented 5 months ago

Yeah my one hypothesis is that Enzyme creates a task that is not waited upon... and then modifies the mesh as well... Which each alone would be a very weird bug indeed, but together makes me feel ridiculous for proposing it.

jlk9 commented 5 months ago

Looks like the autodiff call permutes mesh.Edges.dcEdge

jlk9 commented 5 months ago

Ok. So setting mesh.Edges.dcEdge to its pre-Enzyme value after calling Enzyme.autodiff makes the FD results correct:

mesh.Edges.dcEdge[:] = old_mesh.Edges.dcEdge[:]

Where old_mesh is a deepcopy() of mesh before the autodiff call. Interesting that dcEdge is used in the kernel, but not modified.

jlk9 commented 5 months ago

Since this bug seems to be more of an Enzyme issue I've opened one there: https://github.com/EnzymeAD/Enzyme.jl/issues/1569. Enzyme incorrectly permutes the array mesh.Edges.dcEdge, which altered the FD results.

wsmoses commented 5 months ago

What happens if you do autodiff but all the values are constant

wsmoses commented 5 months ago

Also can you isolate the error without as many dependencies?

wsmoses commented 5 months ago

Per discussion here, I believe Enzyme is correct: https://github.com/EnzymeAD/Enzyme.jl/issues/1569#issuecomment-2189692029

Specifically it behaves the same as calling the orginal code.

Enzyme.Const != immutable. It just means not differentiated. A variable which is Const but modified in place in the function to be differentiated will also be updated in place the same way during AD.

wsmoses commented 5 months ago

Wait never mind, it looks like my computer just simply did not reproduce your issue.

What version of Enzyme/KA/etc are you using?

wsmoses commented 5 months ago

I'm at:

wmoses@beast:~/git/Enzyme.jl/MPAS-Ocean.jl (bug-reduce) $ ls^C
wmoses@beast:~/git/Enzyme.jl/MPAS-Ocean.jl (bug-reduce) $ ~/git/Enzyme.jl/julia-1.10.2/bin/julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.2 (2024-03-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(MOKA) pkg> st
Project MOKA v0.1.0
Status `~/git/Enzyme.jl/MPAS-Ocean.jl/Project.toml`
⌃ [21141c5a] AMDGPU v0.9.5
  [7d9f7c33] Accessors v0.1.36
  [79e6a3ab] Adapt v4.0.4
  [6e4b80f9] BenchmarkTools v1.5.0
  [179af706] CFTime v0.1.3
  [052768ef] CUDA v5.4.2
  [8bb1440f] DelimitedFiles v1.9.1
  [7da242da] Enzyme v0.12.20 `..`
⌃ [28b8d3ca] GR v0.73.5
⌃ [63c18a36] KernelAbstractions v0.9.20
⌃ [da04e1cc] MPI v0.20.8
  [3da0fdf6] MPIPreferences v0.1.11
  [85f8d34a] NCDatasets v0.14.4
  [91a5bcdd] Plots v1.40.4
  [295af30f] Revise v3.5.14
  [09ab397b] StructArrays v0.6.18
  [3a884ed6] UnPack v1.0.2
  [ddb6d928] YAML v0.4.11
  [7cb0a576] MPICH_jll v4.2.1+1
  [ade2ca70] Dates
  [56ddb016] Logging
  [10745b16] Statistics v1.10.0
Info Packages marked with ⌃ have new versions available and may be upgradable.

(MOKA) pkg> ^C

julia> 
wmoses@beast:~/git/Enzyme.jl/MPAS-Ocean.jl (bug-reduce) $ cd ..
wmoses@beast:~/git/Enzyme.jl (uparm) $ git log
commit 758e7b42c45d1667c774fdbef938c349a414fa40 (HEAD -> uparm, origin/uparm)
Author: William S. Moses <gh@wsmoses.com>
Date:   Tue Jun 25 14:05:19 2024 -0400

    uparm

commit f7ef35364f818539a494e6381100e6ac921ff339 (tag: v0.12.19, origin/main, origin/HEAD)
Author: William Moses <gh@wsmoses.com>
Date:   Tue Jun 25 00:43:06 2024 -0400

    Update Project.toml

jlk9 commented 5 months ago

grad and mesh are an array and struct respectively so they're duplicated objects. I tried replacing Horzmesh with a self-written struct in the file to remove the MOKA dependency, I'll try again.

vchuravy commented 5 months ago

@jlk9 can try using Duplicated instead of Const? If that fixes it it's a case of "shadow-confusion",

but even then I am confused about why a @show or yield() changes things.

jlk9 commented 5 months ago

Mentioned in the Enzyme issue too, I reduced the example to eliminate dependency on MOKA.jl. Now I'm just using a stand-in struct in bugMesh.jl that contains the dcEdges array

jlk9 commented 5 months ago

@wsmoses Missed your question about Enzyme and KA versions, sorry! I was responding to some things on my phone. I'm on Enzyme 0.12.15 (what you get with just naive add Enzyme) and KA 0.9.20

wsmoses commented 5 months ago

Replied in other issue, but repasting here (https://github.com/EnzymeAD/Enzyme.jl/issues/1569#issuecomment-2191290389)

Okay I see what's happening here, HorzMesh should likely not be marked inactive since you are differentiating wrt data within. This is implicitly causing runtime activity style issues, but not throwing an error for them.

vchuravy commented 5 months ago

Let's continue this on Enzyme.jl.

The only KA relevant part might be the show that is needed for it to trigger.

JuliaGPU / KernelAbstractions.jl

Printing results changes results in CPU kernel #485