facebookincubator / cinder

Cinder is Meta's internal performance-oriented production version of CPython.
https://trycinder.com
Other
3.42k stars 122 forks source link

add README for benchmarks #67

Open belm0 opened 2 years ago

belm0 commented 2 years ago

When I build Cinder and run the programs in Tools/benchmarks, the static and static_basic variants seem to be slower than the originals. Am I doing something wrong?

(update: -X jit helps, but full static is only about 15% faster?)

$ ./configure && make
$ cd Tools/benchmarks

$ time ../../python.exe fannkuch.py 5
real    0m6.963s

$ time ../../python.exe fannkuch_static_basic.py 5
real    0m7.500s
$ time ../../python.exe -X install-strict-loader fannkuch_static_basic.py 5
real    0m12.140s
$ time ../../python.exe -X install-strict-loader -X jit fannkuch_static_basic.py 5
real    0m6.988s

$ time ../../python.exe fannkuch_static.py 5
real    1m12.689s
$ time ../../python.exe -X install-strict-loader fannkuch_static.py 5
real    0m55.872s
$ time ../../python.exe -X install-strict-loader -X jit fannkuch_static.py 5
real    0m6.071s
belm0 commented 2 years ago

More runs looks better. Some benchmarks still slower.

$ time ../../python.exe fannkuch.py 20
real    0m29.641s
$ time ../../python.exe -X install-strict-loader -X jit fannkuch_static.py 20
real    0m17.892s

$ time ../../python.exe richards.py 200
real    0m30.914s
$ time ../../python.exe -X install-strict-loader -X jit richards_static.py 200
real    0m7.475s

$ time ../../python.exe nqueens.py 20
real    0m5.657s
$ time ../../python.exe -X install-strict-loader -X jit nqueens_static.py 20
real    0m12.813s
belm0 commented 2 years ago

better with the jit-list constraints (static nqueens still much slower overall)

$ time ../../python.exe -X install-strict-loader -X jit -X jit-list-file=jitlist_richards_static.txt -X jit-enable-jit-list-wildcards richards_static.py 200
JIT: Jit/pyjit.cpp:925 -- Enabling wildcards in JIT list
JIT: Jit/jit_list.cpp:33 -- Jit-list file: jitlist_richards_static.txt
real    0m5.802s

$ time ../../python.exe nqueens.py 40
real    0m11.296s
$ time ../../python.exe -X install-strict-loader -X jit -X jit-list-file=jitlist_nqueens_static_basic.txt -X jit-enable-jit-list-wildcards nqueens_static_basic.py 40
JIT: Jit/pyjit.cpp:925 -- Enabling wildcards in JIT list
JIT: Jit/jit_list.cpp:33 -- Jit-list file: jitlist_nqueens_static_basic.txt
real    0m9.917s
$ time ../../python.exe -X install-strict-loader -X jit -X jit-list-file=jitlist_nqueens_static.txt -X jit-enable-jit-list-wildcards nqueens_static.py 40
JIT: Jit/pyjit.cpp:925 -- Enabling wildcards in JIT list
JIT: Jit/jit_list.cpp:33 -- Jit-list file: jitlist_nqueens_static.txt
real    0m24.740s

Suggest a benchmarks/README with some example invocations like this.

carljm commented 2 years ago

Yes, we definitely need a README for the benchmarks, thanks for the report! You seem to have converged on the right way to run them yourself, though. Static nqueens is new and under active development, I wouldn't worry too much about it yet. Richards, deltablue, and fannkuch should all be much faster under Static Python and with the JIT (and faster with SP+JIT than JIT alone.) SP without the JIT is more of a mixed bag; some of the arithmetic-heavy benchmarks (e.g. fannkuch) use primitives a lot in the static version, and we only actually keep primitives unboxed in the JIT. Also fannkuch has some uncharacteristic performance without the JIT because our bytecode quickening currently operates at function level based on number of calls, and fannkuch is just one very expensive function that is only called once, so bytecode quickening never kicks in.

I'll keep this open to track getting both a README on running the benchmarks, and our results from running them, added to the repo.

belm0 commented 2 years ago

About managing jit-lists, it seems to be tedious in general for applications, and wildcards have known performance problems (#29).

I wonder if it would be easier all around to have a mode where only functions in static modules are jitted.

carljm commented 2 years ago

We do actually have that mode too, -X jit-all-static-functions. I think the only reason I used a wildcard jit list for these benchmarks is that I wanted a fair comparison with running the non-static benchmark under the jit, and it didn't seem fair to expose only one of them to the wildcard jit list overhead.

Long-term for many applications the right answer is probably a dynamic mode where hot functions are jitted once they become hot in the process. It just hasn't been a priority because our application is a prefork webserver, so that mode wouldn't work for us. But we are picking up more workloads now, so it might happen sometime soon.

carljm commented 2 years ago

(Oh one gotcha for -X jit-all-static-functions: it's additive, so -X jit -X jit-all-static-functions is the same as -X jit; to jit only static functions you need -X jit -X jit-list-file=/dev/null -X jit-all-static-functions.)

belm0 commented 2 years ago

a script to run benchmarks was added recently: https://github.com/facebookincubator/cinder/commit/77d5d1f55a50b9e099238c9e4f177ee8d668c646