Qqwy / elixir-type_check

TypeCheck: Fast and flexible runtime type-checking for your Elixir projects.
MIT License
521 stars 25 forks source link

Question about Compilation Time #110

Open aj-foster opened 2 years ago

aj-foster commented 2 years ago

Hello there!

I enjoyed your talk at the Amsterdam Elixir meetup, and decided to give TypeCheck a try on our codebase. We have a few structs with a relatively large number of fields, and their old @type t definitions fully specified each field. Upon changing these to @type! and implementing a few @spec!s on functions that used them, we quickly found that our compilation time rose to 3–4 minutes.

For reference, the rough setup is:

A.t() :: %A{...[16 fields]...}
B.t() :: %B{...[8 fields]...}
C.t() :: %B{a: A.t(), b: B.t()}

With six function specifications on module C that reference all three types in various combinations.

So, the question: is this expected? I can imagine that the sheer number of fields on each struct, especially when nested, creates an explosion of code generation. In the documentation you mention:

In the case of for instance a large collection or a deeply-nested 'tree of structs' [runtime checks] might still be too slow.

Could this be true for compile time as well?

More generally, do you have any recommendations on strategies for type-checking structs? Perhaps the full specification of each field would make sense on a constructor function, but the canonical t type should stop at %__MODULE__{} with no fields.

Thanks for your work! Excited for more people to see it at ElixirConf EU.

Qqwy commented 2 years ago

:thinking: Compilation time being affected this badly is not something that is expected nor intended. My best guess as to what causes it, is that the current approach of fully inlining the type-check at each of the places where it is used combines badly with having nested structs containing many fields.

A potential solution would then be to add an option to TypeCheck to not inline certain kinds of checks. But before looking into this, we first need to find out whether this indeed is the problem.

One thing you could try, is to compare the difference in compile times between enabling/disabling runtime checks in the various modules, to see whether the problem indeed originates from the runtime checks being generated (but I'm quite certain that it does).

One more thing you could try, just to test the difference in compile-time, is to see how fast/slow compilation is with version 0.9.x of the codebase (vs the current, 0.10.4 version): Because we want to support type-checking passed-in functions, a number of the type-checks that are generated were rewritten to more complex forms in 0.10.x (and I have not gotten around to switch between the two based on whether there actually is a function value inside). This would be an interesting datapoint, but I don't recommend using version 0.9.x in production because there are a number of bugfixes (only) in newer versions.

And besides this, I'll try to come up with a minimal example as a separate public repo, so it's easier to profile and reason about.

As for the general strategy w.r.t. type-checking 'large' structs: I think that 'check them fully everywhere' is a good default. Being able to override it is however important for performance (moreso runtime performance, but as we see now, likely compile-time as well). For many structs in our codebases we can (up to a point) guarantee that they are only constructed by e.g. a MyModule.new as you also mention. It might make sense to allow people to configure 'check depth' between:

Even nicer might be to perform checks based on the opaqueness of a type's typespec. Unfortunately, in practice this leads into a mire where Dialyzer and Elixir's defaults don't work well together as checking whether a value even is a particular struct is strictly speaking not allowed on opaque types. (c.f. https://github.com/elixir-lang/elixir/issues/8463).

Some discussion about this also happened in the past in the ElixirForum topic, but it was not the only thing we talked about, so it might be a difficult to look through the conversation.

Qqwy commented 2 years ago

Which Elixir and OTP versions are you using by the way?

aj-foster commented 2 years ago

Apologies for the delay (CodeBEAM travel, etc.). I was on OTP 24.3.3 with Elixir 1.12.3 for the original test. Haven't yet tested the items you mentioned, but will put it in my mental queue.

aj-foster commented 2 years ago

I created a smaller test case repository at aj-foster/type_check_example.

With it, here are some test results:

TypeCheck Compilation Benchmarks
(mix clean; time mix compile)

0.12.0 with default settings

mix compile  27.02s user 1.75s system 103% cpu 27.925 total
mix compile  26.98s user 1.78s system 103% cpu 27.881 total
mix compile  26.88s user 1.70s system 103% cpu 27.723 total

0.12.0 with runtime checks disabled for main module

mix compile  0.79s user 0.63s system 139% cpu 1.016 total
mix compile  0.82s user 0.73s system 156% cpu 0.994 total
mix compile  0.80s user 0.72s system 163% cpu 0.932 total

0.12.0 with runtime checks disabled for struct modules

mix compile  29.43s user 1.84s system 102% cpu 30.466 total
mix compile  27.72s user 1.81s system 103% cpu 28.639 total
mix compile  26.95s user 1.65s system 101% cpu 28.061 total

0.12.0 with runtime checks disabled for all modules

mix compile  0.82s user 0.72s system 154% cpu 0.992 total
mix compile  0.80s user 0.72s system 151% cpu 1.005 total
mix compile  0.79s user 0.65s system 158% cpu 0.907 total

0.9.0 with default settings

mix compile  10.02s user 1.55s system 105% cpu 10.962 total
mix compile   9.83s user 1.51s system 106% cpu 10.630 total
mix compile   9.69s user 1.51s system 106% cpu 10.556 total

0.9.0 with runtime checks disabled for main module

mix compile  0.76s user 0.70s system 167% cpu 0.872 total
mix compile  0.75s user 0.70s system 163% cpu 0.889 total
mix compile  0.73s user 0.63s system 151% cpu 0.899 total

All of these tests were run on Elixir 1.13.3 and OTP 24.3.2.

Qqwy commented 2 years ago

Thank you for this example! I can reproduce the slow compilation times on my machine. This is indeed unsustainably slow, and something that we'll look to improve.

A 'simple' thing that I will spend some time on in the near future, is to re-introduce the optimization to the checking-code that version 0.9.0 had. This should result in a similar reduction in compile-time w.r.t. that version at least. (But still I find 10s to be too slow, so after that we'll look into other things as well.)

Qqwy commented 2 years ago

Leaving this here for myself to continue debugging later. These instructions are based on this guide:

Clearly, the beam_ssa_opt pass is the slow pass (especially the ssa_opt_sink sub-pass related to delaying tuple-element extraction).

$ mix deps.compile
$ echo "" >> lib/typecheck_example.ex
$ ERL_COMPILER_OPTIONS=time mix compile
Compiling /.../elixir/cloned_repos/type_check_example/mix.exs
 get_module_name_from_core     :      0.000 s      38.5 kB
 core_lint_module              :      0.000 s      39.1 kB
 sys_core_fold                 :      0.000 s      32.1 kB
 sys_core_alias                :      0.000 s      32.1 kB
 core_transforms               :      0.000 s      32.1 kB
 sys_core_bsm                  :      0.000 s      32.1 kB
 v3_kernel                     :      0.000 s      33.1 kB
 beam_kernel_to_ssa            :      0.000 s      20.6 kB
 beam_ssa_bool                 :      0.000 s      20.6 kB
 beam_ssa_share                :      0.000 s      19.6 kB
 beam_ssa_recv                 :      0.000 s      19.6 kB
 beam_ssa_bsm                  :      0.000 s      19.9 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.000 s  78 %
    annotate_context_parameters:      0.000 s   8 %
    combine_matches            :      0.000 s   6 %
    accept_context_args        :      0.000 s   4 %
    skip_outgoing_tail_extracti:      0.000 s   3 %
 beam_ssa_funs                 :      0.000 s      19.9 kB
 beam_ssa_opt                  :      0.001 s      20.1 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      0.000 s  37 %
    ssa_opt_live               :      0.000 s  15 %
    ssa_opt_type_continue      :      0.000 s  14 %
    ssa_opt_dead               :      0.000 s   8 %
    ssa_opt_cse                :      0.000 s   7 %
    ssa_opt_tail_phis          :      0.000 s   3 %
    ssa_opt_merge_blocks       :      0.000 s   2 %
    ssa_opt_trim_unreachable   :      0.000 s   2 %
    ssa_opt_tuple_size         :      0.000 s   1 %
    ssa_opt_record             :      0.000 s   1 %
    ssa_opt_split_blocks       :      0.000 s   1 %
    ssa_opt_element            :      0.000 s   1 %
    ssa_opt_linearize          :      0.000 s   1 %
    ssa_opt_coalesce_phis      :      0.000 s   1 %
    ssa_opt_float              :      0.000 s   1 %
    ssa_opt_bsm                :      0.000 s   1 %
    ssa_opt_tail_calls         :      0.000 s   1 %
    ssa_opt_sink               :      0.000 s   1 %
    ssa_opt_try                :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_bs_puts            :      0.000 s   0 %
    ssa_opt_ne                 :      0.000 s   0 %
    ssa_opt_get_tuple_element  :      0.000 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_bc_size            :      0.000 s   0 %
    ssa_opt_bsm_shortcut       :      0.000 s   0 %
    ssa_opt_blockify           :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.000 s      20.1 kB
 beam_ssa_pre_codegen          :      0.000 s      22.0 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    linear_scan                :      0.000 s  19 %
    reserve_regs               :      0.000 s  17 %
    live_intervals             :      0.000 s  12 %
    place_frames               :      0.000 s   7 %
    use_set_tuple_element      :      0.000 s   6 %
    sanitize                   :      0.000 s   6 %
    reserve_yregs              :      0.000 s   5 %
    find_yregs                 :      0.000 s   5 %
    assert_no_critical_edges   :      0.000 s   4 %
    frame_size                 :      0.000 s   3 %
    turn_yregs                 :      0.000 s   3 %
    number_instructions        :      0.000 s   3 %
    fix_bs                     :      0.000 s   2 %
    copy_retval                :      0.000 s   2 %
    match_fail_instructions    :      0.000 s   2 %
    opt_get_list               :      0.000 s   2 %
    fix_receives               :      0.000 s   1 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      0.000 s      17.5 kB
 beam_validator_strong         :      0.000 s      17.5 kB
 beam_a                        :      0.000 s      17.2 kB
 beam_block                    :      0.000 s      18.5 kB
 beam_jump                     :      0.000 s      18.4 kB
 beam_peep                     :      0.000 s      18.4 kB
 beam_clean                    :      0.000 s      18.4 kB
 beam_trim                     :      0.000 s      18.4 kB
 beam_flatten                  :      0.000 s      17.1 kB
 beam_z                        :      0.000 s      17.0 kB
 beam_validator_weak           :      0.000 s      17.0 kB
 beam_asm                      :      0.000 s       9.6 kB
Compiling 1 file (.ex)
Compiling /.../elixir/cloned_repos/type_check_example/lib/typecheck_example.ex
 get_module_name_from_core     :      0.000 s      43.0 kB
 core_lint_module              :      0.000 s      43.9 kB
 sys_core_fold                 :      0.000 s      38.0 kB
 sys_core_alias                :      0.000 s      38.0 kB
 core_transforms               :      0.000 s      38.0 kB
 sys_core_bsm                  :      0.000 s      38.0 kB
 v3_kernel                     :      0.000 s      36.7 kB
 beam_kernel_to_ssa            :      0.000 s      13.2 kB
 beam_ssa_bool                 :      0.000 s      13.2 kB
 beam_ssa_share                :      0.000 s      12.2 kB
 beam_ssa_recv                 :      0.000 s      12.2 kB
 beam_ssa_bsm                  :      0.000 s      12.3 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.000 s  62 %
    annotate_context_parameters:      0.000 s  20 %
    combine_matches            :      0.000 s   8 %
    accept_context_args        :      0.000 s   6 %
    skip_outgoing_tail_extracti:      0.000 s   4 %
 beam_ssa_funs                 :      0.000 s      12.3 kB
 beam_ssa_opt                  :      0.000 s      12.3 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      0.000 s  28 %
    ssa_opt_live               :      0.000 s  23 %
    ssa_opt_type_continue      :      0.000 s  15 %
    ssa_opt_dead               :      0.000 s   7 %
    ssa_opt_cse                :      0.000 s   4 %
    ssa_opt_merge_blocks       :      0.000 s   3 %
    ssa_opt_linearize          :      0.000 s   2 %
    ssa_opt_record             :      0.000 s   2 %
    ssa_opt_element            :      0.000 s   2 %
    ssa_opt_tail_phis          :      0.000 s   2 %
    ssa_opt_split_blocks       :      0.000 s   1 %
    ssa_opt_tuple_size         :      0.000 s   1 %
    ssa_opt_coalesce_phis      :      0.000 s   1 %
    ssa_opt_sink               :      0.000 s   1 %
    ssa_opt_trim_unreachable   :      0.000 s   1 %
    ssa_opt_bsm                :      0.000 s   1 %
    ssa_opt_try                :      0.000 s   1 %
    ssa_opt_tail_calls         :      0.000 s   1 %
    ssa_opt_float              :      0.000 s   1 %
    ssa_opt_bs_puts            :      0.000 s   0 %
    ssa_opt_sw                 :      0.000 s   0 %
    ssa_opt_bc_size            :      0.000 s   0 %
    ssa_opt_get_tuple_element  :      0.000 s   0 %
    ssa_opt_ne                 :      0.000 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_bsm_shortcut       :      0.000 s   0 %
    ssa_opt_blockify           :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.000 s      12.3 kB
 beam_ssa_pre_codegen          :      0.000 s      12.8 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    assert_no_critical_edges   :      0.000 s  16 %
    place_frames               :      0.000 s  15 %
    live_intervals             :      0.000 s  14 %
    reserve_regs               :      0.000 s  13 %
    linear_scan                :      0.000 s  11 %
    sanitize                   :      0.000 s   7 %
    use_set_tuple_element      :      0.000 s   6 %
    number_instructions        :      0.000 s   4 %
    fix_bs                     :      0.000 s   3 %
    opt_get_list               :      0.000 s   2 %
    match_fail_instructions    :      0.000 s   2 %
    fix_receives               :      0.000 s   1 %
    turn_yregs                 :      0.000 s   1 %
    reserve_yregs              :      0.000 s   1 %
    copy_retval                :      0.000 s   1 %
    frame_size                 :      0.000 s   1 %
    find_yregs                 :      0.000 s   1 %
    legacy_bs                  :      0.000 s   1 %
 beam_ssa_codegen              :      0.000 s      11.3 kB
 beam_validator_strong         :      0.000 s      11.3 kB
 beam_a                        :      0.000 s      11.2 kB
 beam_block                    :      0.000 s      11.7 kB
 beam_jump                     :      0.000 s      11.6 kB
 beam_peep                     :      0.000 s      11.6 kB
 beam_clean                    :      0.000 s      11.6 kB
 beam_trim                     :      0.000 s      11.6 kB
 beam_flatten                  :      0.000 s      11.1 kB
 beam_z                        :      0.000 s      10.9 kB
 beam_validator_weak           :      0.000 s      10.9 kB
 beam_asm                      :      0.000 s       8.0 kB
Compiling /.../elixir/cloned_repos/type_check_example/lib/typecheck_example.ex
 get_module_name_from_core     :      0.000 s  312718.4 kB
 core_lint_module              :      0.597 s  312718.8 kB
 core_old_inliner              :      0.427 s  312952.7 kB
 sys_core_fold                 :      1.875 s  148309.3 kB
 core_fold_after_inlining      :      0.191 s  148309.3 kB
 sys_core_alias                :      0.184 s  148134.7 kB
 core_transforms               :      0.000 s  148134.7 kB
 sys_core_bsm                  :      0.063 s  148134.7 kB
 v3_kernel                     :      0.869 s  162690.6 kB
 beam_kernel_to_ssa            :      1.097 s   93452.1 kB
Compiling lib/typecheck_example.ex (it's taking more than 10s)
 beam_ssa_bool                 :      2.323 s   92574.8 kB
 beam_ssa_share                :      0.313 s   92511.4 kB
 beam_ssa_recv                 :      0.002 s   92511.4 kB
 beam_ssa_bsm                  :      0.525 s   92514.1 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.521 s 100 %
    annotate_context_parameters:      0.000 s   0 %
    combine_matches            :      0.000 s   0 %
    accept_context_args        :      0.000 s   0 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
 beam_ssa_funs                 :      0.212 s   92514.1 kB
 beam_ssa_opt                  :     44.647 s   72528.0 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_sink               :     30.399 s  68 %
    ssa_opt_type_start         :      4.252 s  10 %
    ssa_opt_type_continue      :      3.178 s   7 %
    ssa_opt_live               :      2.206 s   5 %
    ssa_opt_dead               :      2.099 s   5 %
    ssa_opt_cse                :      1.009 s   2 %
    ssa_opt_tail_phis          :      0.449 s   1 %
    ssa_opt_coalesce_phis      :      0.209 s   0 %
    ssa_opt_element            :      0.180 s   0 %
    ssa_opt_trim_unreachable   :      0.112 s   0 %
    ssa_opt_linearize          :      0.092 s   0 %
    ssa_opt_merge_blocks       :      0.089 s   0 %
    ssa_opt_split_blocks       :      0.088 s   0 %
    ssa_opt_bsm                :      0.052 s   0 %
    ssa_opt_record             :      0.050 s   0 %
    ssa_opt_tail_calls         :      0.009 s   0 %
    ssa_opt_float              :      0.009 s   0 %
    ssa_opt_tuple_size         :      0.007 s   0 %
    ssa_opt_ne                 :      0.005 s   0 %
    ssa_opt_blockify           :      0.005 s   0 %
    ssa_opt_try                :      0.004 s   0 %
    ssa_opt_get_tuple_element  :      0.004 s   0 %
    ssa_opt_bc_size            :      0.003 s   0 %
    ssa_opt_bs_puts            :      0.002 s   0 %
    ssa_opt_sw                 :      0.001 s   0 %
    ssa_opt_bsm_shortcut       :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.073 s   72528.0 kB
 beam_ssa_pre_codegen          :      1.055 s   75843.1 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    find_yregs                 :      0.215 s  20 %
    reserve_yregs              :      0.187 s  18 %
    live_intervals             :      0.150 s  14 %
    place_frames               :      0.095 s   9 %
    linear_scan                :      0.092 s   9 %
    reserve_regs               :      0.065 s   6 %
    use_set_tuple_element      :      0.047 s   4 %
    frame_size                 :      0.046 s   4 %
    turn_yregs                 :      0.034 s   3 %
    assert_no_critical_edges   :      0.028 s   3 %
    number_instructions        :      0.026 s   2 %
    sanitize                   :      0.023 s   2 %
    copy_retval                :      0.015 s   1 %
    fix_bs                     :      0.014 s   1 %
    opt_get_list               :      0.014 s   1 %
    match_fail_instructions    :      0.002 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      0.284 s   65449.5 kB
 beam_validator_strong         :      0.464 s   65449.5 kB
 beam_a                        :      0.014 s   65454.2 kB
 beam_block                    :      0.017 s   67159.3 kB
 beam_jump                     :      0.138 s   66564.1 kB
 beam_peep                     :      0.015 s   66564.1 kB
 beam_clean                    :      0.004 s   66564.1 kB
 beam_trim                     :      0.004 s   66562.0 kB
 beam_flatten                  :      0.013 s   64879.4 kB
 beam_z                        :      0.002 s   63937.6 kB
 beam_validator_weak           :      0.471 s   63937.6 kB
 beam_asm                      :      0.342 s   47094.7 kB

mix decompile TypecheckExample --to erl

results in an Erlang source file that is 310MB large. :astonished:

Qqwy commented 2 years ago

@aj-foster Could you re-run your benchmarks on the new master branch (on any commit post 2e0cc95949f3cc7e942559bcd65a68922e1e8b8b), as well as on the compilation_performance3 branch? I'm very curious how much the current optimizations already help on computers that are newer than my 8-year-old laptop.

aj-foster commented 2 years ago

Here's another round of tests, first on latest master and second on compilation_performance3 (commit SHAs listed for clarity).

The main module in the example is egregious in terms of the density of its type specifications, but probably normal in terms of the total number of references to the struct types. While the 10 seconds would add up across multiple such modules, I'd have no problem turning it on, for example, in CI while running tests. Less problematic modules could easily stay on in all environments.

TypeCheck Compilation Benchmarks
(mix clean; time mix compile)

154fd80c with default settings

mix compile  21.87s user 5.34s system 113% cpu 23.983 total
mix compile  21.47s user 5.15s system 114% cpu 23.148 total
mix compile  21.93s user 5.27s system 114% cpu 23.844 total

154fd80c with runtime checks disabled for main module

mix compile  0.80s user 0.75s system 168% cpu 0.913 total
mix compile  0.80s user 0.73s system 161% cpu 0.948 total
mix compile  0.79s user 0.73s system 165% cpu 0.916 total

154fd80c with runtime checks disabled for struct modules

mix compile  21.55s user 5.21s system 115% cpu 23.218 total
mix compile  21.27s user 5.13s system 115% cpu 22.906 total
mix compile  20.94s user 5.60s system 108% cpu 24.508 total

154fd80c with runtime checks disabled for all modules

mix compile  0.80s user 0.70s system 154% cpu 0.966 total
mix compile  0.79s user 0.67s system 128% cpu 1.128 total
mix compile  0.81s user 0.72s system 135% cpu 1.129 total

---

c99981ce with default settings

mix compile  9.93s user 0.98s system 106% cpu 10.221 total
mix compile  9.86s user 1.03s system 107% cpu 10.156 total
mix compile  9.89s user 0.98s system 106% cpu 10.167 total

c99981ce with runtime checks disabled for main module

mix compile  0.77s user 0.69s system 153% cpu 0.955 total
mix compile  0.81s user 0.77s system 162% cpu 0.974 total
mix compile  0.79s user 0.71s system 162% cpu 0.920 total

c99981ce with runtime checks disabled for struct modules

mix compile  9.89s user 1.03s system 106% cpu 10.221 total
mix compile  10.18s user 1.03s system 106% cpu 10.554 total
mix compile  9.87s user 1.00s system 107% cpu 10.134 total

c99981ce with runtime checks disabled for all modules

mix compile  0.81s user 0.73s system 146% cpu 1.054 total
mix compile  0.81s user 0.74s system 152% cpu 1.018 total
mix compile  0.78s user 0.69s system 157% cpu 0.934 total

All of these tests were run on Elixir 1.13.3 and OTP 24.3.2.