Open stillyslalom opened 2 years ago
Splitting Clapeyron is an idea we've been thinking about for a while. This is mainly because some of the features we've been developing are getting quite niche and often warrant a package of their own. I'm still of the mindset that it's nice to have one unified package. However, as Clapeyron grows, it becomes more apparent that we should split it up.
That being said, I doubt SAFT represents a large overhead in Clapeyron? Surely something like our database parsers and methods should contribute much more. I might be wrong though.
on limited testing:
julia> @time using Revise, Clapeyron #without any EoS
8.001558 seconds (13.22 M allocations: 797.007 MiB, 5.96% gc time, 52.49% compilation time)
julia> @time using Revise, Clapeyron #with all EoS
8.418290 seconds (13.28 M allocations: 802.694 MiB, 6.19% gc time, 50.00% compilation time)
So, the main overhead seems to be the code before the EoS (parsing and methods)
on how to split the package, the main future solution will almost surely be something like what they do in https://github.com/JuliaArrays/ArrayInterface.jl , but we still need to be the proper hierarchy of packages to load.
with julia 1.8, the new macro @time_imports
helps here:
julia> @time_imports using Clapeyron
3.9 ms StaticArraysCore
1072.9 ms StaticArrays
1.9 ms PackedVectorsOfVectors
299.1 ms FillArrays
0.3 ms CommonSolve
57.6 ms MacroTools
2.0 ms ConstructionBase
56.6 ms Setfield
93.1 ms Roots
1.5 ms PositiveFactorizations
33.0 ms NLSolvers
4.0 ms DataAPI
0.6 ms Compat
17.4 ms OrderedCollections
112.6 ms DataStructures
0.7 ms SortingAlgorithms
15.0 ms Missings
9.5 ms DocStringExtensions 57.54% compilation time
162.9 ms ChainRulesCore
1.2 ms ChangesOfVariables
1.9 ms InverseFunctions
14.9 ms IrrationalConstants
1.4 ms LogExpFunctions
0.6 ms StatsAPI
46.9 ms StatsBase
36.2 ms PDMats
0.5 ms Reexport
0.5 ms OpenLibm_jll
81.5 ms Preferences
0.7 ms JLLWrappers
1.4 ms CompilerSupportLibraries_jll
412.9 ms OpenSpecFun_jll 98.13% compilation time (99% recompilation)
30.3 ms SpecialFunctions
1.5 ms Rmath_jll
198.0 ms Rmath 91.84% compilation time
0.6 ms NaNMath
3.3 ms Calculus
70.7 ms DualNumbers
1.4 ms HypergeometricFunctions
16.9 ms StatsFuns
9.4 ms QuadGK
2.8 ms DensityInterface
301.4 ms Distributions
25.1 ms SpatialIndexing
0.3 ms CPUTime
14.0 ms URIs 55.69% compilation time
10.1 ms MbedTLS_jll
93.9 ms MbedTLS 60.69% compilation time
0.7 ms IniFile
130.8 ms HTTP 42.82% compilation time
192.0 ms Parsers 4.61% compilation time
38.3 ms JSON
99.6 ms BlackBoxOptim
5.8 ms DiffResults
1.2 ms DiffRules
0.5 ms CommonSubexpressions
361.3 ms ForwardDiff
0.4 ms Scratch
872.3 ms Unitful
0.3 ms DataValueInterfaces
0.3 ms IteratorInterfaceExtensions
0.2 ms TableTraits
39.8 ms Tables
41.5 ms PooledArrays
467.7 ms SentinelArrays 21.74% compilation time
29.0 ms InlineStrings
46.9 ms WeakRefStrings
4.2 ms TranscodingStreams
0.5 ms Zlib_jll
4.5 ms CodecZlib
22.7 ms FilePathsBase 20.75% compilation time
3340.7 ms CSV 80.23% compilation time (85% recompilation)
13.4 ms ThermoState
270.6 ms Clapeyron
worst offenders:
3340.7 ms CSV 80.23% compilation time (85% recompilation)
1072.9 ms StaticArrays
872.3 ms Unitful
i'm following https://sciml.ai/news/2022/09/21/compile_time/ btw. so it seems that most of our first time to compile is due to CSV
Chris’s blog post today mentioned changing import order to put overload-heavy packages like StaticArrays later to reduce the number of methods that need to be checked when loading other dependencies. I haven’t played around with import order, and it’s a bit painful to trawl through dependency trees to figure out where method overloads are coming from, but moving StaticArrays down the list would be an easy first thing to try.
yes, i'm playing with the package loading order to see the most efficient way. i found one nasty invalidation when loading any JLL after StaticArrays:
julia> @time_imports using StaticArrays,Rmath
5.5 ms StaticArraysCore
753.7 ms StaticArrays
26.2 ms Preferences
0.6 ms JLLWrappers
452.8 ms Rmath_jll 99.65% compilation time (99% recompilation)
119.3 ms Rmath 89.93% compilation time
Wild! Rmath_jll load time is cut from 405 to 5 ms on my machine with the load order reversed. I didn't expect such a large impact.
There are two types of work to be done:
One quick way to reduce package loading by half would be to remove CSV.jl and just use DelimitedFiles.jl? We use the Tables.jl interface to iterate over the parsed values, so anything Tables.jl compliant could work
One idea is to bypass the table parsing directly and pass the params yourself:
model = PR(components;Tc, Pc, acentric_factor, k)
That would skip the compile first time of the DB parser completely, at the cost of needing to specify your params yourself
I wonder if we should allow users to define parameters manually for all our models.
That seems to be the way all other packages like ours work. Ive always found that really tedious but it's definitely easier than the csvs
The complexity of SAFT equations of state imposes quite a lot of overhead on this package, and that’s entirely justified for a research-grade EOS, but for simple day-to-day usage, it’d be great to have a low-overhead, fast-loading EOS package covering ideal/SRK/Peng-Robinson.