SciML / ExponentialUtilities.jl

Fast and differentiable implementations of matrix exponentials, Krylov exponential matrix-vector multiplications ("expmv"), KIOPS, ExpoKit functions, and more. All your exponential needs in SciML form.
https://docs.sciml.ai/ExponentialUtilities/stable/
Other
93 stars 29 forks source link

Long compile times #128

Closed charleskawczynski closed 1 year ago

charleskawczynski commented 1 year ago

We're seeing very long compile times for ExponentialUtilities over at ClimaAtmos.jl:

julia> @time_imports using ClimaAtmos
[ Info: Precompiling ClimaAtmos [b2c96348-7fb7-4fe0-8da9-78d88439e717]
      0.0 ms  IfElse
      0.1 ms  BitTwiddlingConvenienceFunctions
      0.1 ms  CommonSolve
      0.1 ms  CustomUnitRanges
      0.1 ms  DataValueInterfaces
      0.1 ms  GilbertCurves
      0.1 ms  IteratorInterfaceExtensions
      0.1 ms  JuliaNVTXCallbacks_jll
      0.1 ms  NVTX_jll
      0.1 ms  PrecompileTools
      0.1 ms  Reexport
      0.1 ms  SIMDTypes
      0.1 ms  SimpleUnPack
      0.1 ms  SnoopPrecompile
      0.1 ms  TableTraits
      0.1 ms  Tricks
      0.1 ms  TruncatedStacktraces
      0.1 ms  UnPack
      0.1 ms  ZygoteRules
      0.2 ms  AxisAlgorithms
      0.2 ms  CubedSphere
      0.2 ms  Elliptic
      0.2 ms  EnumX
      0.2 ms  ExprTools
      0.2 ms  JLLWrappers
      0.2 ms  MuladdMacro
      0.2 ms  NaNMath
      0.2 ms  Requires
      0.2 ms  StatsAPI
      0.2 ms  TensorCore
      0.2 ms  TextWrap
      0.3 ms  ArtifactWrappers
      0.3 ms  CommonSubexpressions
      0.3 ms  FastClosures
      0.3 ms  GaussQuadrature
      0.3 ms  Parameters
      0.3 ms  SortingAlgorithms
      0.4 ms  FastBroadcast
      0.4 ms  MPIPreferences
      0.4 ms  OpenSpecFun_jll
      0.4 ms  ProgressBars
      0.5 ms  ChangesOfVariables
      0.5 ms  DiffRules
      0.5 ms  Inflate
      0.5 ms  LogExpFunctions
      0.5 ms  RuntimeGeneratedFunctions
      0.6 ms  Compat
      0.6 ms  SymbolicIndexingInterface
      0.7 ms  FastGaussQuadrature
      0.7 ms  FFTW_jll
      0.7 ms  FunctionWrappersWrappers
      0.8 ms  AtmosphericProfilesLibrary
      0.8 ms  ConstructionBase
      0.8 ms  HDF5_jll
      0.8 ms  HypergeometricFunctions
      0.8 ms  Rmath_jll
      0.8 ms  SciMLNLSolve
      0.9 ms  DataAPI
      0.9 ms  Dierckx
      0.9 ms  InverseFunctions
      0.9 ms  OpenLibm_jll
      1.0 ms  Zlib_jll
      1.1 ms  ClimaComms
      1.1 ms  VertexSafeGraphs
      1.2 ms  CLIMAParameters
      1.2 ms  DensityInterface
      1.2 ms  OpenSSL_jll
      1.3 ms  MosaicViews
      1.3 ms  Polyester
      1.5 ms  TerminalLoggers
      1.6 ms  ArgParse
      1.6 ms  StaticArraysCore
      1.7 ms  DiffResults
      1.8 ms  Libiconv_jll
      1.8 ms  NetCDF_jll
      1.8 ms  UnsafeAtomicsLLVM
      1.9 ms  Calculus
      1.9 ms  libblastrampoline_jll
      2.0 ms  ManualMemory
      2.0 ms  Random123
      2.1 ms  CEnum
      2.1 ms  SimpleTraits
      2.2 ms  GPUArraysCore
      2.3 ms  TranscodingStreams
      2.3 ms  XML2_jll
      2.4 ms  Graphics
      2.4 ms  QuadGK
      2.5 ms  DocStringExtensions 50.76% compilation time
      2.7 ms  LLVMExtra_jll 45.65% compilation time
      2.8 ms  NLsolve
      2.8 ms  UnsafeAtomics
      2.9 ms  RootSolvers
      3.0 ms  CloseOpenIntervals
      3.0 ms  MappedArrays
      3.1 ms  CommonDataModel
      3.2 ms  StackViews
      3.3 ms  PaddedViews
      3.4 ms  KernelAbstractions
      3.6 ms  CloudMicrophysics
      3.6 ms  LayoutPointers
      3.8 ms  SLEEFPirates
      4.0 ms  MPICH_jll
      4.4 ms  BFloat16s
      4.7 ms  Thermodynamics
      4.9 ms  Sparspak
      5.0 ms  StatsFuns
      5.1 ms  ArnoldiMethod
      5.2 ms  IrrationalConstants
      5.5 ms  ProgressLogging
      5.5 ms  YAML
      5.7 ms  ComputationalResources
      6.0 ms  CompilerSupportLibraries_jll
      6.1 ms  StringEncodings
      6.3 ms  Distances
      6.3 ms  NLSolversBase
      6.3 ms  TriangularSolve 73.98% compilation time
      6.7 ms  WoodburyMatrices
      6.8 ms  LineSearches
      6.9 ms  AbstractFFTs
      6.9 ms  DiffEqCallbacks
      7.0 ms  LeftChildRightSiblingTrees
      7.1 ms  Missings
      7.4 ms  OrderedCollections
      8.0 ms  ArrayInterfaceCore
      8.0 ms  PreallocationTools 44.42% compilation time (100% recompilation)
      8.5 ms  LDLFactorizations
     10.2 ms  FiniteDiff 35.61% compilation time (100% recompilation)
     10.5 ms  FunctionWrappers
     11.3 ms  FastLapackInterface
     11.7 ms  Preferences
     12.2 ms  SpecialFunctions
     13.3 ms  PolyesterWeave 60.98% compilation time
     13.7 ms  CatIndices
     14.3 ms  FFTViews
     14.4 ms  TiledIteration
     14.6 ms  RandomNumbers 26.08% compilation time
     15.0 ms  SparseDiffTools 28.54% compilation time (100% recompilation)
     15.4 ms  PkgVersion
     15.7 ms  RRTMGP
     15.8 ms  Atomix
     17.8 ms  SurfaceFluxes
     18.1 ms  Krylov
     18.5 ms  IntervalSets
     18.5 ms  StatsBase
     20.3 ms  AMD 88.17% compilation time
     21.1 ms  Tables
     21.3 ms  CFTime 27.14% compilation time
     21.5 ms  Lazy
     21.6 ms  CPUSummary 14.48% compilation time
     23.4 ms  IterativeSolvers
     24.3 ms  GenericSchur
     25.7 ms  LambertW 52.18% compilation time
     26.3 ms  KLU
     28.6 ms  NVTX 80.48% compilation time
     30.2 ms  ClimaTimeSteppers
     30.9 ms  PDMats
     32.1 ms  Adapt 78.04% compilation time (5% recompilation)
     32.2 ms  ThreadingUtilities 68.93% compilation time
     36.6 ms  Interpolations 9.60% compilation time (100% recompilation)
     36.8 ms  RecursiveArrayTools 17.02% compilation time (100% recompilation)
     37.7 ms  Rmath 83.77% compilation time
     40.2 ms  AbstractTrees
     40.6 ms  MacroTools
     42.2 ms  Setfield
     42.9 ms  Static
     43.2 ms  StaticArrayInterface 38.71% compilation time
     43.5 ms  TimerOutputs 14.88% compilation time
     52.3 ms  Graphs
     53.8 ms  ChainRulesCore
     54.2 ms  LinearOperators
     55.0 ms  ColorTypes 20.95% compilation time
     57.6 ms  DiffEqBase 35.07% compilation time
     62.3 ms  ArrayInterface 69.37% compilation time (33% recompilation)
     63.4 ms  RecipesBase
     64.2 ms  DataStructures
     78.7 ms  DualNumbers
     83.1 ms  FixedPointNumbers
     89.4 ms  LLVM 37.41% compilation time (100% recompilation)
     96.4 ms  ForwardDiff
     98.8 ms  TaylorSeries 3.84% compilation time
    118.4 ms  Dierckx_jll 98.64% compilation time (100% recompilation)
    140.8 ms  BlockArrays
    142.2 ms  Colors
    165.3 ms  FillArrays
    177.3 ms  SciMLOperators
    194.5 ms  StrideArraysCore 1.13% compilation time
    195.9 ms  Distributions
    218.5 ms  OffsetArrays
    220.3 ms  NonlinearSolve
    226.7 ms  Ratios 91.79% compilation time (97% recompilation)
    235.2 ms  SciMLBase 1.95% compilation time
    244.1 ms  GPUCompiler 1.20% compilation time
    244.2 ms  FFTW 4.95% compilation time (100% recompilation)
    251.5 ms  LoopVectorization
    251.5 ms  RecursiveFactorization
    313.5 ms  KrylovKit 2.10% compilation time (100% recompilation)
    340.5 ms  SimpleNonlinearSolve 1.01% compilation time
    394.0 ms  NCDatasets 9.35% compilation time (100% recompilation)
    396.3 ms  HostCPUFeatures 9.16% compilation time (100% recompilation)
    407.9 ms  ClimaCore
    422.1 ms  StaticArrays
    534.5 ms  GPUArrays
    607.3 ms  ClimaAtmos
    649.1 ms  VectorizationBase
    650.2 ms  ColorVectorSpace 0.45% compilation time
    703.3 ms  HDF5 52.74% compilation time (87% recompilation)
    741.8 ms  MPI 92.03% compilation time (89% recompilation)
    755.7 ms  ImageBase
    763.1 ms  LinearSolve 0.40% compilation time (100% recompilation)
    857.7 ms  ImageFiltering 1.60% compilation time
    917.7 ms  ArrayLayouts
   1016.8 ms  CUDA 0.54% compilation time
   1022.6 ms  JLD2 1.06% compilation time (36% recompilation)
   1023.3 ms  FileIO 0.72% compilation time (100% recompilation)
   2690.1 ms  ImageCore
   2809.6 ms  OrdinaryDiffEq
   3991.6 ms  Insolation 90.09% compilation time (74% recompilation)
   5229.1 ms  ExponentialUtilities

Title credit to @trontrytel

To reproduce:

git clone https://github.com/CliMA/ClimaAtmos.jl
cd ClimaAtmos.jl
julia --project
@time_imports using ClimaAtmos
ChrisRackauckas commented 1 year ago

Interesting. Is this on the latest version? We cut a few things out recently that chop the startup time considerably, and this looks like the numbers I saw before doing that.

charleskawczynski commented 1 year ago
[[deps.ExponentialUtilities]]
deps = ["Adapt", "ArrayInterface", "GPUArraysCore", "GenericSchur", "LinearAlgebra", "Printf", "SnoopPrecompile", "SparseArrays", "libblastrampoline_jll"]
git-tree-sha1 = "fb7dbef7d2631e2d02c49e2750f7447648b0ec9b"
uuid = "d4d017d3-3776-5f7e-afef-a10c40355c18"
version = "1.24.0"

with

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores
charleskawczynski commented 1 year ago

@ChrisRackauckas what was done to reduce precompilation time before? Maybe this is somehow an issue on our side?

ChrisRackauckas commented 1 year ago

It was just overspecializing. https://github.com/SciML/ExponentialUtilities.jl/pull/119

charleskawczynski commented 1 year ago

Wow, thanks for digging into this!

charleskawczynski commented 1 year ago

@ChrisRackauckas, this seems to have cropped back up somehow?

julia> @time_imports using ClimaAtmos
[ Info: Precompiling ClimaAtmos [b2c96348-7fb7-4fe0-8da9-78d88439e717]
     ...
     10.1 ms  Ratios
     10.2 ms  ArnoldiMethod
     10.2 ms  LayoutPointers
     10.6 ms  CloudMicrophysics
     10.7 ms  LeftChildRightSiblingTrees
     10.9 ms  Thermodynamics
     11.0 ms  CommonDataModel
     11.0 ms  DiffEqBase
     11.1 ms  LineSearches
     11.3 ms  SpecialFunctions
     11.5 ms  Zstd_jll
     11.7 ms  ThreadingUtilities
     11.8 ms  RandomNumbers
     11.8 ms  SparseDiffTools
     12.3 ms  Sparspak
     12.4 ms  CFTime
     12.4 ms  Missings
     12.7 ms  RecipesBase
     12.8 ms  RRTMGP
     13.5 ms  TiledIteration
     13.9 ms  LinearOperators
     14.7 ms  OrderedCollections
     14.9 ms  NLSolversBase
     15.0 ms  AbstractFFTs 65.97% compilation time (100% recompilation)
     15.4 ms  ComputationalResources
     16.4 ms  Preferences
     16.4 ms  SciMLOperators
     16.5 ms  ArrayInterfaceCore
     16.7 ms  KernelAbstractions
     17.0 ms  Lazy
     17.4 ms  CatIndices
     17.5 ms  PolyesterWeave
     19.5 ms  HostCPUFeatures
     20.5 ms  CompilerSupportLibraries_jll 26.04% compilation time
     20.5 ms  MappedArrays
     22.2 ms  TimerOutputs
     22.6 ms  SurfaceFluxes
     23.2 ms  NVTX
     24.1 ms  FFTViews
     24.9 ms  KLU
     25.2 ms  DualNumbers
     25.2 ms  GenericSchur
     25.3 ms  Setfield
     25.6 ms  Tables
     26.1 ms  StatsBase
     29.4 ms  IntervalSets
     31.0 ms  FunctionWrappers
     31.4 ms  PDMats
     31.7 ms  Distances
     32.6 ms  RecursiveArrayTools
     33.9 ms  PkgVersion
     36.7 ms  ImageBase
     37.8 ms  MPI
     43.6 ms  OffsetArrays
     45.0 ms  Interpolations
     45.6 ms  ClimaTimeSteppers
     46.7 ms  Krylov
     49.4 ms  AbstractTrees
     49.6 ms  Statistics
     50.2 ms  Graphs
     50.9 ms  TaylorSeries
     51.3 ms  NetCDF_jll
     51.8 ms  Static
     57.7 ms  ForwardDiff
     58.0 ms  LLVM
     58.3 ms  ChainRulesCore
     73.9 ms  SuiteSparse_jll
     81.9 ms  GPUArrays
     93.9 ms  DataStructures
    101.0 ms  FFTW
    131.9 ms  FillArrays
    146.4 ms  ClimaCore
    171.5 ms  BlockArrays
    185.1 ms  ColorTypes
    206.8 ms  VectorizationBase
    211.5 ms  FileIO
    216.4 ms  Dierckx_jll 98.69% compilation time (100% recompilation)
    218.9 ms  LoopVectorization
    225.0 ms  ColorVectorSpace 5.06% compilation time (100% recompilation)
    226.3 ms  JLD2 4.90% compilation time
    228.9 ms  RecursiveFactorization
    231.3 ms  GPUCompiler 2.99% compilation time
    252.9 ms  StrideArraysCore
    298.7 ms  SimpleNonlinearSolve
    299.0 ms  SciMLBase 5.38% compilation time (100% recompilation)
    342.9 ms  LinearSolve
    378.0 ms  Colors
    410.0 ms  BandedMatrices
    443.5 ms  Distributions
    445.8 ms  ImageFiltering
    458.1 ms  NCDatasets
    553.7 ms  ImageCore 1.61% compilation time (100% recompilation)
    556.8 ms  ClimaAtmos
    558.7 ms  NonlinearSolve
    736.6 ms  CUDA
    778.9 ms  FixedPointNumbers
    785.1 ms  OrdinaryDiffEq
    836.3 ms  StaticArrays
   1093.0 ms  HDF5 90.87% compilation time (95% recompilation)
   1116.0 ms  ArrayLayouts
   1588.4 ms  ExponentialUtilities

I see it's not as bad as before, but it still seems pretty long

ChrisRackauckas commented 1 year ago

Are you invalidating?

julia> @time_imports using ExponentialUtilities
      0.2 ms  SuiteSparse
      0.3 ms  Requires
      1.8 ms  ArrayInterface
     29.4 ms  Preferences
      0.3 ms  SnoopPrecompile
     15.1 ms  GenericSchur
      0.4 ms  Adapt
      1.5 ms  GPUArraysCore
      0.2 ms  ArrayInterface → ArrayInterfaceGPUArraysCoreExt
    241.9 ms  ExponentialUtilities
charleskawczynski commented 1 year ago

Ah, yeah, we're still seeing a lot of invalidations in julia 1.9, but they look much better (and ExponentialUtilities compile times look much better, too) in 1.10 beta. So we'll just wait for the update for the fix, thanks!