ExpHP / rsp2

phonons in rust
Apache License 2.0
2 stars 1 forks source link

ThinLTO + AVX = MASSIVE compile times for rsp2-structure #47

Closed ExpHP closed 6 years ago

ExpHP commented 6 years ago

The compile time for rsp2-structure explodes from 5s to over 200s when compiled with target-cpu=native. Need to look further into this and submit a bug to rustc, where somebody will probably submit a bug to LLVM.

All invocations begin with cargo +nightly rustc --release --

rustc flags compile time
-Ctarget-cpu=native (ivybridge) BAD
-Ctarget-feature=+sse4.2 ok
-Ctarget-feature=+avx BAD
-Ctarget-feature=+avx -Zthinlto=no ok
-Ctarget-feature=+avx -Ccodegen-units=1 ok

(note: default codegen-units was changed to 16 in the latest release (1.24))

-Ztime-passes says all the time is spent in the final LTO pass.

-Ztime-passes ``` $ cargo rustc --release -- -Ztime-passes Compiling rsp2-array-types v0.1.0 (file:///home/lampam/cpp/other/rust/rsp2/src/util/array-types) Compiling rsp2-structure v0.1.0 (file:///home/lampam/cpp/other/rust/rsp2/src/structure) time: 0.024; rss: 48MB parsing time: 0.000; rss: 48MB recursion limit time: 0.000; rss: 48MB crate injection time: 0.000; rss: 48MB plugin loading time: 0.000; rss: 48MB plugin registration time: 0.090; rss: 84MB expansion time: 0.000; rss: 84MB maybe building test harness time: 0.001; rss: 84MB maybe creating a macro crate time: 0.002; rss: 84MB creating allocators time: 0.001; rss: 84MB AST validation time: 0.016; rss: 89MB name resolution time: 0.009; rss: 89MB complete gated feature checking time: 0.012; rss: 93MB lowering ast -> hir time: 0.005; rss: 94MB early lint checks time: 0.012; rss: 96MB indexing hir time: 0.000; rss: 94MB load query result cache time: 0.000; rss: 94MB looking for entry point time: 0.000; rss: 94MB looking for plugin registrar time: 0.001; rss: 94MB loop checking time: 0.000; rss: 94MB static item recursion checking time: 0.001; rss: 94MB attribute checking time: 0.003; rss: 94MB stability checking time: 0.023; rss: 108MB type collecting time: 0.000; rss: 108MB outlives testing time: 0.000; rss: 108MB impl wf inference time: 0.045; rss: 123MB coherence checking time: 0.000; rss: 123MB variance testing time: 0.039; rss: 126MB wf checking time: 0.018; rss: 126MB item-types checking time: 0.429; rss: 138MB item-bodies checking time: 0.040; rss: 139MB const checking time: 0.014; rss: 139MB privacy checking time: 0.001; rss: 139MB intrinsic checking time: 0.006; rss: 139MB match checking time: 0.003; rss: 140MB liveness checking time: 0.236; rss: 147MB borrow checking time: 0.001; rss: 147MB MIR borrow checking time: 0.000; rss: 147MB MIR effect checking time: 0.003; rss: 147MB death checking time: 0.000; rss: 147MB unused lib feature checking time: 0.041; rss: 147MB lint checking time: 0.000; rss: 147MB resolving dependency formats time: 0.047; rss: 149MB write metadata time: 0.211; rss: 161MB translation item collection time: 0.014; rss: 165MB codegen unit partitioning time: 0.060; rss: 185MB llvm function passes [rsp2_structure10] time: 0.086; rss: 191MB llvm function passes [rsp2_structure0] time: 0.058; rss: 195MB llvm function passes [rsp2_structure7] time: 0.035; rss: 198MB llvm function passes [rsp2_structure8] time: 0.344; rss: 200MB llvm module passes [rsp2_structure8] time: 0.032; rss: 200MB llvm function passes [rsp2_structure3] time: 0.692; rss: 200MB llvm module passes [rsp2_structure7] time: 0.039; rss: 201MB llvm function passes [rsp2_structure4] time: 0.959; rss: 201MB llvm module passes [rsp2_structure0] time: 0.036; rss: 203MB llvm function passes [rsp2_structure11] time: 0.351; rss: 203MB llvm module passes [rsp2_structure3] time: 0.033; rss: 205MB llvm function passes [rsp2_structure6] time: 0.428; rss: 206MB llvm module passes [rsp2_structure11] time: 0.338; rss: 206MB llvm module passes [rsp2_structure6] time: 0.027; rss: 207MB llvm function passes [rsp2_structure2] time: 0.029; rss: 207MB llvm function passes [rsp2_structure15] time: 0.263; rss: 208MB llvm module passes [rsp2_structure15] time: 0.420; rss: 208MB llvm module passes [rsp2_structure2] time: 0.027; rss: 208MB llvm function passes [rsp2_structure1] time: 0.028; rss: 209MB llvm function passes [rsp2_structure14] time: 1.179; rss: 209MB llvm module passes [rsp2_structure4] time: 0.021; rss: 209MB llvm function passes [rsp2_structure13] time: 0.292; rss: 209MB llvm module passes [rsp2_structure1] time: 0.213; rss: 209MB llvm module passes [rsp2_structure14] time: 0.016; rss: 209MB llvm function passes [rsp2_structure5] time: 1.294; rss: 209MB translate to LLVM IR time: 0.000; rss: 209MB assert dep graph time: 0.000; rss: 209MB serialize dep graph time: 3.129; rss: 209MB translation time: 0.216; rss: 191MB llvm module passes [rsp2_structure13] time: 0.027; rss: 185MB llvm function passes [rsp2_structure12] time: 0.027; rss: 177MB llvm function passes [rsp2_structure9] time: 0.087; rss: 177MB llvm module passes [rsp2_structure9] time: 0.142; rss: 177MB llvm module passes [rsp2_structure12] time: 0.281; rss: 178MB llvm module passes [rsp2_structure5] time: 3.790; rss: 179MB llvm module passes [rsp2_structure10] time: 0.422; rss: 177MB LTO passes time: 0.127; rss: 180MB codegen passes [rsp2_structure4-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.570; rss: 180MB LTO passes time: 0.582; rss: 180MB LTO passes time: 0.278; rss: 181MB codegen passes [rsp2_structure7-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.324; rss: 180MB LTO passes time: 0.323; rss: 181MB codegen passes [rsp2_structure0-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.172; rss: 180MB codegen passes [rsp2_structure6-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.318; rss: 179MB LTO passes time: 0.160; rss: 180MB LTO passes time: 0.168; rss: 181MB codegen passes [rsp2_structure15-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.130; rss: 181MB codegen passes [rsp2_structure3-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.497; rss: 181MB LTO passes time: 0.148; rss: 182MB codegen passes [rsp2_structure11-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.234; rss: 181MB LTO passes time: 0.339; rss: 182MB LTO passes time: 0.187; rss: 183MB LTO passes time: 0.151; rss: 183MB codegen passes [rsp2_structure1-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.115; rss: 182MB codegen passes [rsp2_structure8-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.065; rss: 181MB codegen passes [rsp2_structure14-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.123; rss: 180MB LTO passes time: 0.049; rss: 182MB codegen passes [rsp2_structure12-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.185; rss: 181MB LTO passes time: 0.088; rss: 181MB LTO passes time: 0.058; rss: 182MB codegen passes [rsp2_structure9-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.103; rss: 181MB codegen passes [rsp2_structure13-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.137; rss: 180MB LTO passes time: 0.438; rss: 180MB LTO passes time: 0.063; rss: 181MB codegen passes [rsp2_structure5-808a07d3525227e85f9922d863ae2cdd.rs] time: 0.165; rss: 180MB codegen passes [rsp2_structure2-808a07d3525227e85f9922d863ae2cdd.rs] time: 218.391; rss: 178MB LTO passes time: 0.230; rss: 178MB codegen passes [rsp2_structure10-808a07d3525227e85f9922d863ae2cdd.rs] time: 222.656; rss: 166MB LLVM passes time: 0.000; rss: 163MB serialize work products time: 0.004; rss: 163MB linking Finished release [optimized] target(s) in 227.8 secs ```
ExpHP commented 6 years ago

Updated my nightly and things are ok now. Looks like whatever was causing this is already fixed.