CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
975 stars 193 forks source link

Should we cap CUDA at v3.3 for now to guard agaisn't the CUDA v3.4.2 bug? #1996

Closed tomchor closed 3 years ago

tomchor commented 3 years ago

It appears that CUDA 3.4 has a bug, which apparently caused some trouble in https://github.com/CliMA/Oceananigans.jl/issues/1995 and https://github.com/CliMA/Oceananigans.jl/pull/1988.

At the moment, however, adding Oceananigans still installs the latest CUDA version since CUDA's compat entry just specifies version 3:

https://github.com/CliMA/Oceananigans.jl/blob/73be08d708131a66402eb8fc0086c47ef80a2d0e/Project.toml#L36

It seems like the bug was merged upstream but they still haven't tagged a new release. Should we change the compat entry to protect users in the meantime? I'm not sure if the best way is to cap the version at 3.3 or if it's possible to exclude version 3.4.2 specifically, but I feel like it's best to act on this, no?

navidcy commented 3 years ago

Well, we use Manifest.toml so that will pin CUDA to whatever version the manifest says so. So only think we shouldn't do is to merge a PR that updates the package version in Manifest.

tomchor commented 3 years ago

I'm pretty sure fresh installs don't necessarily reproduce the Manifest. I think unless you pin something, Pkg will try to get the latest set of packages that are still compatible. In fact, I don't think it's even recommended to add a Manifest with the github repo (at least not according to github: https://github.com/github/gitignore/blob/b0012e4930d0a8c350254a3caeedf7441ea286a3/Julia.gitignore#L20-L24)

This is an example of a fresh Oceananigans install I just made. Notice it installed CUDA v3.4.2:

(@v1.6) pkg> activate .
  Activating new environment at `~/Dropbox/tests/fresh/Project.toml`

(fresh) pkg> add Oceananigans
    Updating registry at `~/.julia/registries/General`
   Resolving package versions...
   Installed ChainRulesCore ─ v1.7.2
   Installed Tables ───────── v1.6.0
   Installed Parsers ──────── v2.0.5
   Installed StaticArrays ─── v1.2.13
    Updating `~/Dropbox/tests/fresh/Project.toml`
  [9e8cae18] + Oceananigans v0.63.1
    Updating `~/Dropbox/tests/fresh/Manifest.toml`
  [621f4979] + AbstractFFTs v1.0.1
  [79e6a3ab] + Adapt v3.3.1
  [4fba245c] + ArrayInterface v3.1.33
  [ab4f0b2a] + BFloat16s v0.1.0
  [fa961155] + CEnum v0.4.1
  [179af706] + CFTime v0.1.1
  [052768ef] + CUDA v3.4.2
  [72cfdca4] + CUDAKernels v0.3.0
  [7057c7e9] + Cassette v0.3.9
  [d360d2e6] + ChainRulesCore v1.7.2
  [34da2185] + Compat v3.39.0
  [a8cc5b0e] + Crayons v4.0.4
  [7445602f] + CubedSphere v0.1.0
  [9a962f9c] + DataAPI v1.9.0
  [864edb3b] + DataStructures v0.18.10
  [e2d170a0] + DataValueInterfaces v1.0.0
  [b552c78f] + DiffRules v1.3.1
  [ffbed154] + DocStringExtensions v0.8.5
  [b305315f] + Elliptic v1.0.1
  [e2ba6199] + ExprTools v0.1.6
  [7a1cc6ca] + FFTW v1.4.5
  [5789e2e9] + FileIO v1.11.1
  [0c68f7d7] + GPUArrays v8.1.1
  [61eb1bfa] + GPUCompiler v0.12.9
  [c27321d9] + Glob v1.3.0
  [615f187c] + IfElse v0.1.0
  [92d709cd] + IrrationalConstants v0.1.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [033835bb] + JLD2 v0.4.14
  [692b3bcd] + JLLWrappers v1.3.0
  [0f8b85d8] + JSON3 v1.9.1
  [63c18a36] + KernelAbstractions v0.7.0
  [929cbde3] + LLVM v4.6.0
  [2ab3a3ac] + LogExpFunctions v0.3.3
  [da04e1cc] + MPI v0.19.0
  [1914dd2f] + MacroTools v0.5.8
  [85f8d34a] + NCDatasets v0.11.7
  [77ba4419] + NaNMath v0.3.5
  [9e8cae18] + Oceananigans v0.63.1
  [6fe1bfb0] + OffsetArrays v1.10.7
  [bac558e1] + OrderedCollections v1.4.1
  [69de0a69] + Parsers v2.0.5
  [0e08944d] + PencilArrays v0.10.0
  [4a48f351] + PencilFFTs v0.12.5
  [21216c6a] + Preferences v1.2.2
  [74087812] + Random123 v1.4.2
  [e6cf234a] + RandomNumbers v1.5.3
  [189a3867] + Reexport v1.2.2
  [ae029012] + Requires v1.1.3
  [6038ab10] + Rotations v1.0.2
  [1bc83da4] + SafeTestsets v0.0.1
  [d496a93d] + SeawaterPolynomials v0.2.2
  [276daf66] + SpecialFunctions v1.7.0
  [aedffcd0] + Static v0.3.3
  [90137ffa] + StaticArrays v1.2.13
  [15972242] + StaticPermutations v0.3.0
  [5e0ebb24] + Strided v1.1.2
  [09ab397b] + StructArrays v0.6.3
  [856f2bd8] + StructTypes v1.7.3
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.6.0
  [6aa5eb33] + TaylorSeries v0.10.13
  [a759f4b9] + TimerOutputs v0.5.13
  [3bb67fe8] + TranscodingStreams v0.9.6
  [bc48ee85] + Tullio v0.3.2
  [9d95972d] + TupleTools v1.3.0
  [f5851436] + FFTW_jll v3.3.10+0
  [0234f1f7] + HDF5_jll v1.12.0+1
  [1d5cc7b8] + IntelOpenMP_jll v2018.0.3+2
  [dad2f222] + LLVMExtra_jll v0.0.11+0
  [856f044c] + MKL_jll v2021.1.1+2
  [7cb0a576] + MPICH_jll v3.4.2+0
  [9237b28f] + MicrosoftMPI_jll v10.1.3+0
  [7243133f] + NetCDF_jll v400.702.400+0
  [fe0851c0] + OpenMPI_jll v4.1.1+2
  [458c3c95] + OpenSSL_jll v1.1.10+0
  [efe28fd5] + OpenSpecFun_jll v0.5.5+0
  [0dad84c5] + ArgTools
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [8bb1440f] + DelimitedFiles
  [8ba89e20] + Distributed
  [f43a241f] + Downloads
  [b77e0a4c] + InteractiveUtils
  [4af54fe1] + LazyArtifacts
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [de0858da] + Printf
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [1a1011a3] + SharedArrays
  [6462fe0b] + Sockets
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [fa267f1f] + TOML
  [a4e569a6] + Tar
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [e66e0078] + CompilerSupportLibraries_jll
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [05823500] + OpenLibm_jll
  [83775a58] + Zlib_jll
  [8e850ede] + nghttp2_jll
  [3f19e933] + p7zip_jll
navidcy commented 3 years ago

Good point then. Yes, let's do it.

glwagner commented 3 years ago

We have found that it's best for Oceananigans to have a static environment (one of the caveats here):

https://github.com/github/gitignore/blob/b0012e4930d0a8c350254a3caeedf7441ea286a3/Julia.gitignore#L22-L23

I agree we should pin CUDA in Project.toml.