Closed MichaelMauderer closed 2 years ago
Wow, that orders of magnitude difference (3 to 112 seconds) on duplicate-function-elimination
is very shocking... at least it should be easy to debug since it's so big (and also convenient it is the first pass to run).
I would try using the linux perf
command on it (perf record wasm-opt [..]
, stop it after that pass, then perf report
). That will hopefully show something useful.
One possible guess is that that pass hashes function contents. Perhaps the hash function in the STL on that machine is very slow, somehow? We do depend on system libraries there, so system differences can matter.
Another thing to check is whether the slow build is a debug build - maybe wasm-pack
builds it at -O0
by mistake? But I wouldn't expect that to cause such a big difference on a specific pass and not everything.
Finally, I'd make sure the machine has enough memory, as maybe that pass uses more than others, and it starts to swap. Can try to build with BINARYEN_CORES=1
or 2 or 4, etc., which will limit parallelism but also save memory (if there are more cores).
Relevant to #4165
It seems that was-pack
ships a wasm-opt
without debug symbols. So, not yet anything useful to report from perf
until I can test another build of wasm-opt
that has debug symbols.
But just setting BINARYEN_CORES=1
makes a big difference:
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[PassRunner] running passes...
[PassRunner] running pass: duplicate-function-elimination... 17.8452 seconds.
[PassRunner] (validating)
[PassRunner] running pass: ssa-nomerge... 5.18185 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dce... 2.21731 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.53555 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.349529 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 1.74303 seconds.
[PassRunner] (validating)
[PassRunner] running pass: pick-load-signs... 0.409768 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 0.935715 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-pushing... 0.516302 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals-nostructure... 3.30613 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 2.22659 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.994874 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.2351 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.84092 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals... 4.50136 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 1.99572 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.563741 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.5434 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.587189 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 1.97074 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-folding... 3.17574 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.54356 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.37743 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.339822 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.26992 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 0.919477 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 1.28952 seconds.
[PassRunner] (validating)
[PassRunner] running pass: rse... 2.12282 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 2.10949 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dae-optimizing... 19.0256 seconds.
[PassRunner] (validating)
[PassRunner] running pass: inlining-optimizing... 96.7644 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-function-elimination... 6.48776 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-import-elimination... 0.0143792 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-globals-optimizing... 0.841014 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-module-elements... 0.471183 seconds.
[PassRunner] (validating)
[PassRunner] running pass: memory-packing... 0.0210139 seconds.
[PassRunner] (validating)
[PassRunner] running pass: directize... 0.0128064 seconds.
[PassRunner] (validating)
[PassRunner] running pass: generate-stack-ir... 0.235107 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-stack-ir... 8.36705 seconds.
[PassRunner] (validating)
[PassRunner] passes took 201.888 seconds.
vs BINARYEN_CORES=16
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[PassRunner] running passes...
[PassRunner] running pass: duplicate-function-elimination... 112.663 seconds.
[PassRunner] (validating)
[PassRunner] running pass: ssa-nomerge... 5.85834 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dce... 2.43297 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.54267 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.365984 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 1.59643 seconds.
[PassRunner] (validating)
[PassRunner] running pass: pick-load-signs... 0.41725 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 0.992736 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-pushing... 0.493042 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals-nostructure... 3.78044 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 2.45175 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.04343 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.21475 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 4.07825 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals... 4.50737 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 2.14361 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.595855 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.85174 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.588754 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 2.13316 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-folding... 3.28297 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.65314 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.41555 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.373972 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.35225 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 0.975925 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 1.44795 seconds.
[PassRunner] (validating)
[PassRunner] running pass: rse... 2.71446 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 2.3204 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dae-optimizing... 121.434 seconds.
[PassRunner] (validating)
[PassRunner] running pass: inlining-optimizing... 374.782 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-function-elimination... 45.7073 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-import-elimination... 0.00285077 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-globals-optimizing... 5.20232 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-module-elements... 0.479619 seconds.
[PassRunner] (validating)
[PassRunner] running pass: memory-packing... 0.00903188 seconds.
[PassRunner] (validating)
[PassRunner] running pass: directize... 0.00209986 seconds.
[PassRunner] (validating)
[PassRunner] running pass: generate-stack-ir... 0.269585 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-stack-ir... 9.84304 seconds.
[PassRunner] (validating)
[PassRunner] passes took 726.02 seconds.
Memory does not seem to an issue, though. It never seems to go even beyond 1GB used in either case.
An interesting thing is that we are also testing that on macOS and the same code that @MichaelMauderer is posting about is taking 30-45 seconds (!!!) on M1 macOS, which is crazily fast in comparison to Linux and Windows machines.
So maybe this is actually related to the version of wasm-opt
that is used in wasm-bindgen
Using a newly built version from this repo I see much more reasonable results, where more cores are faster at least. These are both from Manjaro.
BINARYEN_CORES=16
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[PassRunner] running passes
[PassRunner] running pass: duplicate-function-elimination... 2.15518 seconds.
[PassRunner] (validating)
[PassRunner] running pass: memory-packing... 0.727007 seconds.
[PassRunner] (validating)
[PassRunner] running pass: once-reduction... 0.804615 seconds.
[PassRunner] (validating)
[PassRunner] running pass: ssa-nomerge... 4.66156 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dce... 2.29155 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 1.03365 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.44295 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 1.0565 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 2.56435 seconds.
[PassRunner] (validating)
[PassRunner] running pass: pick-load-signs... 1.13374 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 2.02423 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-pushing... 1.22919 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals-nostructure... 3.97407 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.37542 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.29458 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.11653 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.59432 seconds.
[PassRunner] (validating)
[PassRunner] running pass: local-cse... 2.11877 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals... 5.19623 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.09495 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.22572 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.22234 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.22349 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.21541 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-folding... 3.33701 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.75992 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.36783 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 1.12842 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.65416 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 1.98178 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 2.43883 seconds.
[PassRunner] (validating)
[PassRunner] running pass: rse... 2.54077 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.36713 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dae-optimizing... 4.46666 seconds.
[PassRunner] (validating)
[PassRunner] running pass: inlining-optimizing... 105.261 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-function-elimination... 1.15851 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-import-elimination... 0.691016 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-globals-optimizing... 0.913862 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-module-elements... 1.12312 seconds.
[PassRunner] (validating)
[PassRunner] running pass: directize... 0.755405 seconds.
[PassRunner] (validating)
[PassRunner] running pass: generate-stack-ir... 1.01954 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-stack-ir... 10.5984 seconds.
[PassRunner] (validating)
[PassRunner] passes took 202.34 seconds.
[PassRunner] (final validation)
11.98% wasm-opt libbinaryen.so [.] wasm::validateBinaryenIR(wasm::Module&, wasm::ValidationInfo&)::BinaryenIRValidator::visitExpression ◆
7.90% wasm-opt libbinaryen.so [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, bool>, std::allocator<std::pair<unsigned long const, bool> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std▒
5.82% wasm-opt libbinaryen.so [.] std::_Hashtable<wasm::Expression*, wasm::Expression*, std::allocator<wasm::Expression*>, std::__detail::_Identity, std::equal_to<wasm::Expression*>, std::hash<wasm::Expression*>, std::▒
4.16% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::BranchUtils::BranchSeeker, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::BranchSeeker, void> >::scan ▒
2.89% wasm-opt libbinaryen.so [.] wasm::validateBinaryenIR ▒
2.77% wasm-opt libc.so.6 [.] unlink_chunk.constprop.0 ▒
2.61% wasm-opt libc.so.6 [.] malloc_consolidate ▒
2.30% wasm-opt libc.so.6 [.] _int_malloc ▒
2.10% wasm-opt libbinaryen.so [.] wasm::Block::finalize ▒
2.10% wasm-opt libbinaryen.so [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, unsigned char>, std::allocator<std::pair<unsigned long const, unsigned char> >, std::__detail::_Select1st, std::equal_to<u▒
1.92% wasm-opt libbinaryen.so [.] std::vector<wasm::Walker<wasm::BranchUtils::BranchSeeker, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::BranchSeeker, void> >::Task, std::allocator<wasm::Walker<wasm::BranchUtils::▒
1.70% wasm-opt libbinaryen.so [.] wasm::CoalesceLocals::pickIndicesFromOrder ▒
1.66% wasm-opt libc.so.6 [.] __libc_malloc ▒
1.55% wasm-opt libc.so.6 [.] _int_free ▒
1.45% wasm-opt libc.so.6 [.] __strcmp_avx2 ▒
0.93% wasm-opt libbinaryen.so [.] wasm::LocalGraphInternal::Flower::flow ▒
0.93% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::BranchUtils::getBranchTargets(wasm::Expression*)::Scanner, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::getBranchTargets(wasm::Expression*)::Scanner, void> ▒
0.84% wasm-opt libc.so.6 [.] __libc_free ▒
0.76% wasm-opt libstdc++.so.6.0.29 [.] std::_Rb_tree_insert_and_rebalance ▒
0.75% wasm-opt libbinaryen.so [.] wasm::Walker<wasm::BranchUtils::BranchSeeker, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::BranchSeeker, void> >::pushTask ▒
0.69% wasm-opt libbinaryen.so [.] wasm::TypeUpdater::visitExpression ▒
0.68% wasm-opt libbinaryen.so [.] wasm::BranchUtils::operateOnScopeNameUses<wasm::BranchUtils::operateOnScopeNameUsesAndSentTypes<wasm::BranchUtils::BranchSeeker::visitExpression(wasm::Expression*)::{lambda(wasm::Name&▒
0.62% wasm-opt libbinaryen.so [.] std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, unsigned char>, std::allocator<std::pair<unsigned long const, unsigned char> >, std::__detail::_Select1st, std::e▒
0.58% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::validateBinaryenIR(wasm::Module&, wasm::ValidationInfo&)::BinaryenIRValidator, wasm::UnifiedExpressionVisitor<wasm::validateBinaryenIR(wasm::Module&, wasm::Valid▒
BINARYEN_CORES=1
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[PassRunner] running passes
[PassRunner] running pass: duplicate-function-elimination... 5.94185 seconds.
[PassRunner] (validating)
[PassRunner] running pass: memory-packing... 0.699456 seconds.
[PassRunner] (validating)
[PassRunner] running pass: once-reduction... 0.972953 seconds.
[PassRunner] (validating)
[PassRunner] running pass: ssa-nomerge... 4.47582 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dce... 2.26037 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 1.02758 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.38387 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 1.06008 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 2.56524 seconds.
[PassRunner] (validating)
[PassRunner] running pass: pick-load-signs... 1.12496 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 1.9688 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-pushing... 1.19446 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals-nostructure... 3.924 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.23851 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.28146 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.02873 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.53683 seconds.
[PassRunner] (validating)
[PassRunner] running pass: local-cse... 2.13155 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals... 5.09726 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.07781 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.17475 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 2.94784 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 1.1603 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.03909 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-folding... 3.04993 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.65812 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.24237 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 1.0186 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.57216 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 1.95945 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 2.35517 seconds.
[PassRunner] (validating)
[PassRunner] running pass: rse... 2.45102 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.11829 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dae-optimizing... 27.7428 seconds.
[PassRunner] (validating)
[PassRunner] running pass: inlining-optimizing... 314.137 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-function-elimination... 2.75518 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-import-elimination... 0.686738 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-globals-optimizing... 2.14635 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-module-elements... 1.08174 seconds.
[PassRunner] (validating)
[PassRunner] running pass: directize... 0.925092 seconds.
[PassRunner] (validating)
[PassRunner] running pass: generate-stack-ir... 0.990785 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-stack-ir... 10.6958 seconds.
[PassRunner] (validating)
[PassRunner] passes took 438.9 seconds.
[PassRunner] (final validation)
Overhead Command Shared Object Symbol
12.09% wasm-opt libbinaryen.so [.] wasm::validateBinaryenIR(wasm::Module&, wasm::ValidationInfo&)::BinaryenIRValidator::visitExpression ◆
7.76% wasm-opt libbinaryen.so [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, bool>, std::allocator<std::pair<unsigned long const, bool> >, std::__detail::_Select1st, std::equal_to<unsigned long>, std▒
5.87% wasm-opt libbinaryen.so [.] std::_Hashtable<wasm::Expression*, wasm::Expression*, std::allocator<wasm::Expression*>, std::__detail::_Identity, std::equal_to<wasm::Expression*>, std::hash<wasm::Expression*>, std::▒
4.12% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::BranchUtils::BranchSeeker, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::BranchSeeker, void> >::scan ▒
3.84% wasm-opt libc.so.6 [.] _int_malloc ▒
3.59% wasm-opt libc.so.6 [.] unlink_chunk.constprop.0 ▒
3.03% wasm-opt libc.so.6 [.] malloc_consolidate ▒
2.99% wasm-opt libbinaryen.so [.] wasm::validateBinaryenIR ▒
2.16% wasm-opt libbinaryen.so [.] wasm::Block::finalize ▒
2.08% wasm-opt libbinaryen.so [.] std::_Hashtable<unsigned long, std::pair<unsigned long const, unsigned char>, std::allocator<std::pair<unsigned long const, unsigned char> >, std::__detail::_Select1st, std::equal_to<u▒
1.92% wasm-opt libbinaryen.so [.] std::vector<wasm::Walker<wasm::BranchUtils::BranchSeeker, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::BranchSeeker, void> >::Task, std::allocator<wasm::Walker<wasm::BranchUtils::▒
1.53% wasm-opt libbinaryen.so [.] wasm::CoalesceLocals::pickIndicesFromOrder ▒
1.37% wasm-opt libc.so.6 [.] __strcmp_avx2 ▒
1.26% wasm-opt libc.so.6 [.] _int_free ▒
1.16% wasm-opt libc.so.6 [.] __libc_malloc ▒
0.94% wasm-opt libbinaryen.so [.] wasm::LocalGraphInternal::Flower::flow ▒
0.92% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::BranchUtils::getBranchTargets(wasm::Expression*)::Scanner, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::getBranchTargets(wasm::Expression*)::Scanner, void> ▒
0.90% wasm-opt libc.so.6 [.] __libc_free ▒
0.74% wasm-opt libbinaryen.so [.] wasm::Walker<wasm::BranchUtils::BranchSeeker, wasm::UnifiedExpressionVisitor<wasm::BranchUtils::BranchSeeker, void> >::pushTask ▒
0.70% wasm-opt libstdc++.so.6.0.29 [.] std::_Rb_tree_insert_and_rebalance ▒
0.68% wasm-opt libbinaryen.so [.] wasm::BranchUtils::operateOnScopeNameUses<wasm::BranchUtils::operateOnScopeNameUsesAndSentTypes<wasm::BranchUtils::BranchSeeker::visitExpression(wasm::Expression*)::{lambda(wasm::Name&▒
0.67% wasm-opt libbinaryen.so [.] wasm::TypeUpdater::visitExpression ▒
0.62% wasm-opt libbinaryen.so [.] std::__detail::_Map_base<unsigned long, std::pair<unsigned long const, unsigned char>, std::allocator<std::pair<unsigned long const, unsigned char> >, std::__detail::_Select1st, std::e▒
0.59% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::validateBinaryenIR(wasm::Module&, wasm::ValidationInfo&)::BinaryenIRValidator, wasm::UnifiedExpressionVisitor<wasm::validateBinaryenIR(wasm::Module&, wasm::Valid▒
0.53% wasm-opt libbinaryen.so [.] wasm::PostWalker<wasm::EffectAnalyzer::InternalAnalyzer, wasm::OverriddenVisitor<wasm::EffectAnalyzer::InternalAnalyzer, void> >::scan ▒
Interesting that the wasm-pack
build is so slow on duplicate-function-elimination
with additional cores. The parallel part there is pretty trivial (no atomics or mutexes or anything - it just does a simple replacement of function names in parallel, using the fact that functions have disjoint data). So I'm not sure what's slow there, but maybe profiling that one pass specifically could show something.
Running on Windows with 16 cores and the latest binaries (107).
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[PassRunner] running passes
[PassRunner] running pass: duplicate-function-elimination... 1.10864 seconds.
[PassRunner] (validating)
[PassRunner] running pass: memory-packing... 0.0294545 seconds.
[PassRunner] (validating)
[PassRunner] running pass: once-reduction... 0.0611963 seconds.
[PassRunner] (validating)
[PassRunner] running pass: ssa-nomerge... 3.47568 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dce... 1.92887 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.582926 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.23225 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.469429 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 3.57925 seconds.
[PassRunner] (validating)
[PassRunner] running pass: pick-load-signs... 0.536011 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 3.40764 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-pushing... 0.716726 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals-nostructure... 15.432 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.31819 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.712375 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 1.74165 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 3.67034 seconds.
[PassRunner] (validating)
[PassRunner] running pass: local-cse... 2.65892 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals... 21.2357 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.19059 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.626389 seconds.
[PassRunner] (validating)
[PassRunner] running pass: coalesce-locals... 2.73532 seconds.
[PassRunner] (validating)
[PassRunner] running pass: reorder-locals... 0.625499 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.10039 seconds.
[PassRunner] (validating)
[PassRunner] running pass: code-folding... 2.79751 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.66299 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 2.01292 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.443504 seconds.
[PassRunner] (validating)
[PassRunner] running pass: merge-blocks... 1.61059 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 3.3332 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 3.17921 seconds.
[PassRunner] (validating)
[PassRunner] running pass: rse... 1.62575 seconds.
[PassRunner] (validating)
[PassRunner] running pass: vacuum... 3.19819 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dae-optimizing... 4.84012 seconds.
[PassRunner] (validating)
[PassRunner] running pass: inlining-optimizing... 213.848 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-function-elimination... 0.37251 seconds.
[PassRunner] (validating)
[PassRunner] running pass: duplicate-import-elimination... 0.002494 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-globals-optimizing... 0.394977 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-module-elements... 0.471014 seconds.
[PassRunner] (validating)
[PassRunner] running pass: directize... 0.041338 seconds.
[PassRunner] (validating)
[PassRunner] running pass: generate-stack-ir... 0.482444 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-stack-ir... 11.2527 seconds.
[PassRunner] (validating)
[PassRunner] passes took 328.745 seconds.
[PassRunner] (final validation)
So, it seems the huge difference is gone in the current version.
It seems that
wasm-opt
on my machine has serious differences in performance when running either on Windows 10 or Manjaro.I'm using
wasm-opt
throughwasm-pack
on a non-trivial Rust project. On Windows, thewasm-opt
pass takes 133.157 seconds, on the same machine, different partition running Manjaro the same compile takes 728.146 seconds.This is the output when setting
BINARYEN_PASS_DEBUG=1
for Windowsand on Manjaro
The specs of the machine used: Processor AMD Ryzen Threadripper 2950X 16-Core Processor 3.50 GHz Installed RAM 64.0 GB (63.9 GB usable)
Is there a known reason for this performance difference? My preferred work environment is Manjaro, but the time it takes to run
wasm-opt
there is killing my workflow, and I do need to work with the fully optimized output at the moment.