PistonDevelopers / dyon

A rusty dynamically typed scripting language
Apache License 2.0
1.77k stars 56 forks source link

Solving parser bottleneck #430

Open bvssvni opened 7 years ago

bvssvni commented 7 years ago

Did some profiling with cmr's hprof library on a 582 loc file:

Timing information for root profiler:
  run - 1 * 1.1s = 1.1s @ 0.9hz
    lazy - 1 * 114ns = 114ns (0.0%)
    parse - 1 * 1.1s = 1.1s (98.3%)
    prelude - 1 * 577.6us = 577.6us (0.1%)
    check - 1 * 38.0us = 38.0us (0.0%)
    convert - 1 * 5.9ms = 5.9ms (0.5%)
    join - 1 * 1.6ms = 1.6ms (0.1%)
    ignored - 1 * 94ns = 94ns (0.0%)

Over 98% of the time is spent parsing.

lib.rs:

let parse_guard = enter("parse");
    let mut data = vec![];
    try!(parse_errstr(syntax_rules, &d, &mut data).map_err(
        |err| format!("In `{}:`\n{}", source, err)
    ));
    drop(parse_guard);

Could figure out what's taking so much time in Piston-Meta.

Develop binary format for Piston-Meta

The meta data is all you need, while the syntax is just to make it readable. Sometimes you reuse the same data over and over, but want to load it as fast as possible.

Could develop an efficient binary format for meta data that loads much quicker. This format can also save space by using a string table. Meta data conversion could happen directly from binary data (changes required in Piston-Meta).

The benefit with this direction is that all meta formats automatically gets a binary format.

Add a bin type to Dyon

Also been thinking about compression and privacy (simple encryption, for example for game content). Then it hit me that perhaps Dyon could get a built-in bin type. Could also use it as a bit vector.

bvssvni commented 7 years ago

Perhaps a better idea: Generate meta data in Rust code.

If you don't need the original source, you might as well include the meta data in the binary executable. This will skip the parsing step completely, while being compatible with all meta data converters in Rust.

bvssvni commented 7 years ago

Added new function for loading directly from meta data in https://github.com/PistonDevelopers/dyon/pull/432. Included an example that generates meta data in Rust code. It works!

Upside is that the binary becomes self contained, with no external code. Loads very fast, because there is no parsing. Skips lifetime and type checks by assuming the source already has been checked. This technique could be used when packaging apps.

Downside is that build times in release mode are scary: 1167.39 seconds for the n-body example. This should be straight forward code for the Rust compiler. I am wondering whether LLVM does some partial evaluation since it knows the input and this could be the reason it takes so long time.

$ time ./target/debug/examples/test
-0.16907514416856514
-0.16908671301860925

real    0m1.378s
user    0m1.368s
sys 0m0.009s

$ time ./target/release/examples/test
-0.16907514416856514
-0.16908671301860925

real    0m0.077s
user    0m0.069s
sys 0m0.006s

$ time ./target/debug/examples/test_compile
-0.16907514416856514
-0.16908671301860925

real    0m0.888s
user    0m0.827s
sys 0m0.014s

$ time ./target/release/examples/test_compile
-0.16907514416856514
-0.16908671301860925

real    0m0.051s
user    0m0.041s
sys 0m0.006s
nagisa commented 7 years ago

Try -Z time-passes flag for rustc to see where time is spent (e.g. export RUSTFLAGS="-Z time-passes"). Feel free to also open an issue on the rustc. I assume that by “meta data” you mean some generated rust code; it might be hitting some super-linear code path somewhere and ideally none would exist; getting rid of it is difficult sometimes.

bvssvni commented 7 years ago

@nagisa Thanks! Running it now.

bvssvni commented 7 years ago

Here is what I got:

$ cargo run --release --example test_compile
   Compiling dyon v0.16.0 (file:///Users/sven/rust/dyon)
time: 0.039 parsing
time: 0.000 recursion limit
time: 0.000 crate injection
time: 0.000 plugin loading
time: 0.000 plugin registration
time: 0.308 expansion
time: 0.000 maybe building test harness
time: 0.001 maybe creating a macro crate
time: 0.000 checking for inline asm in case the target doesn't support it
time: 0.001 complete gated feature checking
time: 0.005 early lint checks
time: 0.001 AST validation
time: 0.013 name resolution
time: 0.007 lowering ast -> hir
time: 0.001 indexing hir
time: 0.001 attribute checking
time: 0.000 language item collection
time: 0.001 lifetime resolution
time: 0.000 looking for entry point
time: 0.000 looking for plugin registrar
time: 0.003 region resolution
time: 0.001 loop checking
time: 0.000 static item recursion checking
time: 0.011 compute_incremental_hashes_map
time: 0.000 load_dep_graph
time: 0.001 stability index
time: 0.002 stability checking
time: 0.001 type collecting
time: 0.000 variance inference
time: 0.000 impl wf inference
time: 0.008 coherence checking
time: 0.000 wf checking
time: 0.001 item-types checking
time: 0.871 item-bodies checking
time: 0.000 drop-impl checking
time: 0.023 const checking
time: 0.002 privacy checking
time: 0.001 intrinsic checking
time: 0.001 effect checking
time: 0.001 match checking
time: 0.003 liveness checking
time: 0.020 rvalue checking
time: 0.498 MIR dump
  time: 0.015   SimplifyCfg
  time: 0.018   QualifyAndPromoteConstants
  time: 0.020   TypeckMir
  time: 0.000   SimplifyBranches
  time: 0.008   SimplifyCfg
time: 0.062 MIR cleanup and validation
time: 0.121 borrow checking
time: 0.000 reachability checking
time: 0.001 death checking
time: 0.000 unused lib feature checking
time: 0.030 lint checking
time: 0.003 resolving dependency formats
  time: 0.000   NoLandingPads
  time: 0.010   SimplifyCfg
  time: 0.010   EraseRegions
  time: 0.003   AddCallGuards
  time: 0.994   ElaborateDrops
  time: 0.000   NoLandingPads
  time: 0.023   SimplifyCfg
  time: 0.005   InstCombine
  time: 0.000   Deaggregator
  time: 0.000   CopyPropagation
  time: 0.006   SimplifyLocals
  time: 0.003   AddCallGuards
  time: 0.000   PreTrans
time: 1.054 MIR optimisations
  time: 0.000   write metadata
  time: 0.208   translation item collection
  time: 0.002   codegen unit partitioning
  time: 0.002   internalize symbols
time: 0.606 translation
time: 0.000 assert dep graph
time: 0.000 serialize dep graph
  time: 34.105  llvm function passes [0]
  time: 1215.066    llvm module passes [0]
  time: 6.071   codegen passes [0]
  time: 0.001   codegen passes [0]
time: 1255.267  LLVM passes
time: 0.000 serialize work products
  time: 0.432   running linker
time: 0.439 linking

The "llvm module passes" part takes most of the time.

One intriguing idea I have: Knowing the input could trigger some optimization in LLVM, where it attempts to evaluate things, thereby essentially doing what the Dyon run-time is supposed to be doing: Executing the program, but at a much slower speed.

bvssvni commented 7 years ago

Opened https://github.com/rust-lang/rust/issues/39082 with updated to latest nightly.