Split cubic & ideal EOS (and common backbone) into separate package(s)

stillyslalom commented 2 years ago

The complexity of SAFT equations of state imposes quite a lot of overhead on this package, and that’s entirely justified for a research-grade EOS, but for simple day-to-day usage, it’d be great to have a low-overhead, fast-loading EOS package covering ideal/SRK/Peng-Robinson.

pw0908 commented 2 years ago

Splitting Clapeyron is an idea we've been thinking about for a while. This is mainly because some of the features we've been developing are getting quite niche and often warrant a package of their own. I'm still of the mindset that it's nice to have one unified package. However, as Clapeyron grows, it becomes more apparent that we should split it up.

That being said, I doubt SAFT represents a large overhead in Clapeyron? Surely something like our database parsers and methods should contribute much more. I might be wrong though.

longemen3000 commented 2 years ago

on limited testing:

julia> @time using Revise, Clapeyron #without any EoS
  8.001558 seconds (13.22 M allocations: 797.007 MiB, 5.96% gc time, 52.49% compilation time)

julia> @time using Revise, Clapeyron #with all EoS
  8.418290 seconds (13.28 M allocations: 802.694 MiB, 6.19% gc time, 50.00% compilation time)

So, the main overhead seems to be the code before the EoS (parsing and methods)

on how to split the package, the main future solution will almost surely be something like what they do in https://github.com/JuliaArrays/ArrayInterface.jl , but we still need to be the proper hierarchy of packages to load.

longemen3000 commented 2 years ago

with julia 1.8, the new macro @time_imports helps here:

julia> @time_imports using Clapeyron
      3.9 ms  StaticArraysCore
   1072.9 ms  StaticArrays
      1.9 ms  PackedVectorsOfVectors
    299.1 ms  FillArrays
      0.3 ms  CommonSolve
     57.6 ms  MacroTools
      2.0 ms  ConstructionBase
     56.6 ms  Setfield
     93.1 ms  Roots
      1.5 ms  PositiveFactorizations
     33.0 ms  NLSolvers
      4.0 ms  DataAPI
      0.6 ms  Compat
     17.4 ms  OrderedCollections
    112.6 ms  DataStructures
      0.7 ms  SortingAlgorithms
     15.0 ms  Missings
      9.5 ms  DocStringExtensions 57.54% compilation time
    162.9 ms  ChainRulesCore
      1.2 ms  ChangesOfVariables
      1.9 ms  InverseFunctions
     14.9 ms  IrrationalConstants
      1.4 ms  LogExpFunctions
      0.6 ms  StatsAPI
     46.9 ms  StatsBase
     36.2 ms  PDMats
      0.5 ms  Reexport
      0.5 ms  OpenLibm_jll
     81.5 ms  Preferences
      0.7 ms  JLLWrappers
      1.4 ms  CompilerSupportLibraries_jll
    412.9 ms  OpenSpecFun_jll 98.13% compilation time (99% recompilation)
     30.3 ms  SpecialFunctions
      1.5 ms  Rmath_jll
    198.0 ms  Rmath 91.84% compilation time
      0.6 ms  NaNMath
      3.3 ms  Calculus
     70.7 ms  DualNumbers
      1.4 ms  HypergeometricFunctions
     16.9 ms  StatsFuns
      9.4 ms  QuadGK
      2.8 ms  DensityInterface
    301.4 ms  Distributions
     25.1 ms  SpatialIndexing
      0.3 ms  CPUTime
     14.0 ms  URIs 55.69% compilation time
     10.1 ms  MbedTLS_jll
     93.9 ms  MbedTLS 60.69% compilation time
      0.7 ms  IniFile
    130.8 ms  HTTP 42.82% compilation time
    192.0 ms  Parsers 4.61% compilation time
     38.3 ms  JSON
     99.6 ms  BlackBoxOptim
      5.8 ms  DiffResults
      1.2 ms  DiffRules
      0.5 ms  CommonSubexpressions
    361.3 ms  ForwardDiff
      0.4 ms  Scratch
    872.3 ms  Unitful
      0.3 ms  DataValueInterfaces
      0.3 ms  IteratorInterfaceExtensions
      0.2 ms  TableTraits
     39.8 ms  Tables
     41.5 ms  PooledArrays
    467.7 ms  SentinelArrays 21.74% compilation time
     29.0 ms  InlineStrings
     46.9 ms  WeakRefStrings
      4.2 ms  TranscodingStreams
      0.5 ms  Zlib_jll
      4.5 ms  CodecZlib
     22.7 ms  FilePathsBase 20.75% compilation time
   3340.7 ms  CSV 80.23% compilation time (85% recompilation)
     13.4 ms  ThermoState
    270.6 ms  Clapeyron

worst offenders:

   3340.7 ms  CSV 80.23% compilation time (85% recompilation)
   1072.9 ms  StaticArrays
   872.3 ms  Unitful

i'm following https://sciml.ai/news/2022/09/21/compile_time/ btw. so it seems that most of our first time to compile is due to CSV

stillyslalom commented 2 years ago

Chris’s blog post today mentioned changing import order to put overload-heavy packages like StaticArrays later to reduce the number of methods that need to be checked when loading other dependencies. I haven’t played around with import order, and it’s a bit painful to trawl through dependency trees to figure out where method overloads are coming from, but moving StaticArrays down the list would be an easy first thing to try.

longemen3000 commented 2 years ago

yes, i'm playing with the package loading order to see the most efficient way. i found one nasty invalidation when loading any JLL after StaticArrays:

julia> @time_imports using StaticArrays,Rmath
      5.5 ms  StaticArraysCore
    753.7 ms  StaticArrays
     26.2 ms  Preferences
      0.6 ms  JLLWrappers
    452.8 ms  Rmath_jll 99.65% compilation time (99% recompilation)
    119.3 ms  Rmath 89.93% compilation time

stillyslalom commented 2 years ago

Wild! Rmath_jll load time is cut from 405 to 5 ms on my machine with the load order reversed. I didn't expect such a large impact.

longemen3000 commented 2 years ago

There are two types of work to be done:

package wise, find the combination and order that minimizes load time.
code wise, reduce the amount of invalidations and ambiguities caused by our own code. We don't invalidate that much (because we define new methods rather that using other ones) but we have quite a bit of ambiguities

One quick way to reduce package loading by half would be to remove CSV.jl and just use DelimitedFiles.jl? We use the Tables.jl interface to iterate over the parsed values, so anything Tables.jl compliant could work

longemen3000 commented 2 years ago

problems found:

longemen3000 commented 1 year ago

One idea is to bypass the table parsing directly and pass the params yourself:

model = PR(components;Tc, Pc, acentric_factor, k)

That would skip the compile first time of the DB parser completely, at the cost of needing to specify your params yourself

pw0908 commented 1 year ago

I wonder if we should allow users to define parameters manually for all our models.

That seems to be the way all other packages like ours work. Ive always found that really tedious but it's definitely easier than the csvs

ClapeyronThermo / Clapeyron.jl

Split cubic & ideal EOS (and common backbone) into separate package(s) #106