JuliaSIMD / CPUSummary.jl

MIT License
7 stars 4 forks source link

precompile `__init__` #21

Closed ranocha closed 1 year ago

ranocha commented 1 year ago

I noticed that this package has a non-negligible using time when looking at @time_imports using Trixi. I added some explicit precompile statements for Julia v1.9 and newer to reduce this.

Current main branch:

$ julia --project=. -e 'import Pkg; Pkg.resolve(); Pkg.precompile(); using InteractiveUtils; @time_imports using CPUSummary'
  No Changes to `~/.julia/dev/CPUSummary/Project.toml`
  No Changes to `~/.julia/dev/CPUSummary/Manifest.toml`
      0.1 ms  IfElse
     18.6 ms  Static
      0.8 ms  CpuId
    153.9 ms  CPUSummary 96.24% compilation time

This PR:

$ julia --project=. -e 'import Pkg; Pkg.resolve(); Pkg.precompile(); using InteractiveUtils; @time_imports using CPUSummary'
  No Changes to `~/.julia/dev/CPUSummary/Project.toml`
  No Changes to `~/.julia/dev/CPUSummary/Manifest.toml`
      0.1 ms  IfElse
     18.6 ms  Static
      0.7 ms  CpuId
      9.3 ms  Preferences
      0.2 ms  PrecompileTools
     14.3 ms  CPUSummary 62.64% compilation time

This is on an old notebook with Intel X86 CPU. The remaining compilation time seems to come from CpuId.jl, which is used in https://github.com/JuliaSIMD/CPUSummary.jl/blob/93aef785f94fe8d9a3dde70f7d09e0fcc2215232/src/x86.jl#L49-L56 Commenting out the line ci = CpuId.cacheinclusive() removes the remaining compilation time.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +1.63 :tada:

Comparison is base (07e0b1a) 15.19% compared to head (d765c12) 16.82%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #21 +/- ## ========================================== + Coverage 15.19% 16.82% +1.63% ========================================== Files 4 5 +1 Lines 204 208 +4 ========================================== + Hits 31 35 +4 Misses 173 173 ``` | [Impacted Files](https://app.codecov.io/gh/JuliaSIMD/CPUSummary.jl/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaSIMD) | Coverage Δ | | |---|---|---| | [src/CPUSummary.jl](https://app.codecov.io/gh/JuliaSIMD/CPUSummary.jl/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaSIMD#diff-c3JjL0NQVVN1bW1hcnkuamw=) | `50.00% <100.00%> (+2.00%)` | :arrow_up: | | [src/precompile.jl](https://app.codecov.io/gh/JuliaSIMD/CPUSummary.jl/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaSIMD#diff-c3JjL3ByZWNvbXBpbGUuamw=) | `100.00% <100.00%> (ø)` | | | [src/x86.jl](https://app.codecov.io/gh/JuliaSIMD/CPUSummary.jl/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaSIMD#diff-c3JjL3g4Ni5qbA==) | `64.51% <100.00%> (+1.18%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

chriselrod commented 1 year ago

Thanks!

Commenting out the line ci = CpuId.cacheinclusive() removes the remaining compilation time.

Any idea why this isn't getting cached?

chriselrod commented 1 year ago

It'd also be nice if we could instead run this under the interpreter. I wonder how slow that would be. For quick single-runs like this, seems like it may be a waste of disk space (for example) to compile it.

ranocha commented 1 year ago

Thanks!

Commenting out the line ci = CpuId.cacheinclusive() removes the remaining compilation time.

Any idea why this isn't getting cached?

Not really. But the remaining compilation time is gone with https://github.com/m-j-w/CpuId.jl/pull/57

ranocha commented 1 year ago

It'd also be nice if we could instead run this under the interpreter. I wonder how slow that would be. For quick single-runs like this, seems like it may be a waste of disk space (for example) to compile it.

Sounds reasonable. My motivation to cut the load times as much as possible are HPC applications with Trixi.jl. Even if something like 150ms does not sound much in serial, it quickly adds up to the total CPU time when running on a large number of nodes in parallel....

chriselrod commented 1 year ago

For code like this (no loops), the interpreter is probably going to be much faster than compile + runtime. @evaling isn't going to be compiling well.

This is obviously a great improvement, so I wouldn't bother looking into it more except perhap's for curiosity's sake.