using CUDA gives the wrong results when using Neptune.jl

I get strange responses when doing GPU operations in Neptune. My notebook is as follows:

### A Pluto.jl notebook ###
# v0.14.0

using Markdown
using InteractiveUtils

# ╔═╡ e0759dc0-00f7-11ed-33bf-798763eed380
using CUDA

# ╔═╡ 1a01d310-00f8-11ed-075b-1f3005d81023
x1 = rand(Float32,10000, 10000)

# ╔═╡ 29ba5480-00f8-11ed-2234-770480816ebb
cx1=cu(x1)

# ╔═╡ 2a58b760-00f8-11ed-096b-c9028005aa29
x2 = rand(Float32,10000, 10000)

# ╔═╡ 26c7ef80-00f8-11ed-2075-f11f7b613289
cx2=cu(x2)

# ╔═╡ 3b6e09b0-00f8-11ed-3c1c-bdd18c0c1e5f
x1*x2

# ╔═╡ 4111fa20-00f8-11ed-3510-41550273f914
cx1*cx2

# ╔═╡ Cell order:
# ╠═e0759dc0-00f7-11ed-33bf-798763eed380
# ╠═1a01d310-00f8-11ed-075b-1f3005d81023
# ╠═29ba5480-00f8-11ed-2234-770480816ebb
# ╠═2a58b760-00f8-11ed-096b-c9028005aa29
# ╠═26c7ef80-00f8-11ed-2075-f11f7b613289
# ╠═3b6e09b0-00f8-11ed-3c1c-bdd18c0c1e5f
# ╠═4111fa20-00f8-11ed-3510-41550273f914

If you run the last cell more than once, the GPU multiplication does not match the CPU multiplication, and if you keep running it the matrix appears to be full of zeros.

I don't have issues when doing the same operations on VSCodium.

compleathorseplayer / Neptune.jl

using CUDA gives the wrong results when using Neptune.jl #30