crertel / graphmath

An Elixir library for performing 2D and 3D mathematics.
The Unlicense
79 stars 14 forks source link

Optimization: add type guards to functions? #43

Closed icefoxen closed 1 month ago

icefoxen commented 3 months ago

Apparently adding type guards on numerical functions can let the BEAM optimizer generate better code, Wings3D uses this heavily. If you're interested I can try it out and benchmark how much it actually helps.

crertel commented 3 months ago

That would be lovely!

I'm curious if the guard will still work for both integers and floats, is my main concern.

I'd love to use is_number, if that would have similar benefits.

icefoxen commented 3 months ago

I dunno, good question! I'm used to assuming that everything in a vector is a float, but I'll try out a few variations and see how it goes.

icefoxen commented 3 months ago

I have some results! Confusing ones. Altered graphmath to add guards to a few functions in this fork: https://github.com/icefoxen/graphmath

Benchmarked with this set of tests: https://hg.sr.ht/~icefox/graphmath_bench

Unaltered code:

Name                            ips        average  deviation         median         99th %
graphmath_vec3_add          13.01 M       76.84 ns  ±9159.72%          57 ns         210 ns
graphmath_vec3_dot           9.90 M      101.03 ns ±18801.93%          76 ns         274 ns
graphmath_mat4_transp        8.12 M      123.14 ns ±37117.69%          32 ns         125 ns
graphmath_vec3_cross         6.52 M      153.28 ns ±12000.18%         116 ns         422 ns
graphmath_vm4_mul            4.81 M      207.93 ns  ±1560.15%         174 ns         632 ns
graphmath_mat4_add           4.65 M      215.15 ns  ±2762.27%         170 ns         655 ns
graphmath_vec3_rotate        2.66 M      376.20 ns  ±6629.31%         301 ns        1092 ns
graphmath_mat4_mul           0.77 M     1305.93 ns   ±397.82%        1093 ns        4081 ns

Name                     Memory usage
graphmath_vec3_add               32 B
graphmath_vec3_dot                0 B
graphmath_mat4_transp           136 B
graphmath_vec3_cross             32 B
graphmath_vm4_mul                32 B
graphmath_mat4_add              136 B
graphmath_vec3_rotate           208 B
graphmath_mat4_mul              136 B

Type guards using is_float():

Name                            ips        average  deviation         median         99th %
graphmath_vec3_dot          14.82 M       67.46 ns ±37823.92%          38 ns         138 ns
graphmath_vec3_cross         9.06 M      110.41 ns ±32179.37%          49 ns         169 ns
graphmath_vec3_add           8.97 M      111.48 ns ±46094.33%          41 ns         146 ns
graphmath_mat4_transp        8.42 M      118.78 ns ±43366.05%          32 ns         140 ns
graphmath_vm4_mul            7.91 M      126.39 ns ±30059.67%          64 ns         229 ns
graphmath_vec3_rotate        6.86 M      145.75 ns ±23640.86%          91 ns         330 ns
graphmath_mat4_add           4.00 M      250.14 ns ±24711.68%          93 ns         357 ns
graphmath_mat4_mul           2.77 M      360.52 ns ±15763.40%         223 ns         812 ns

Name                     Memory usage
graphmath_vec3_dot               16 B
graphmath_vec3_cross             80 B
graphmath_vec3_add               80 B
graphmath_mat4_transp           136 B
graphmath_vm4_mul                80 B
graphmath_vec3_rotate            80 B
graphmath_mat4_add              392 B
graphmath_mat4_mul              392 B

Type guards using is_float(), with a fallback using no guards:

Name                            ips        average  deviation         median         99th %
graphmath_vec3_dot          14.18 M       70.50 ns ±38031.32%          40 ns         143 ns
graphmath_vec3_add           8.89 M      112.45 ns ±46384.95%          42 ns         150 ns
graphmath_mat4_transp        8.80 M      113.65 ns ±33097.61%          32 ns         127 ns
graphmath_vm4_mul            8.65 M      115.57 ns ±25127.65%          64 ns         225 ns
graphmath_vec3_cross         6.80 M      147.05 ns  ±2449.46%         114 ns         420 ns
graphmath_vec3_rotate        5.97 M      167.44 ns ±23624.68%          95 ns         335 ns
graphmath_mat4_add           4.99 M      200.23 ns ±22949.23%          89 ns         325 ns
graphmath_mat4_mul           2.73 M      365.69 ns ±13068.49%         231 ns         834 ns

Name                     Memory usage
graphmath_vec3_dot               16 B
graphmath_vec3_add               80 B
graphmath_mat4_transp           136 B
graphmath_vm4_mul                80 B
graphmath_vec3_cross             32 B
graphmath_vec3_rotate            80 B
graphmath_mat4_add              392 B
graphmath_mat4_mul              392 B

Type guards using is_number(), with a fallback using no guards:

is_number() with fallback:
Name                            ips        average  deviation         median         99th %
graphmath_vec3_add          12.51 M       79.96 ns  ±5947.86%          60 ns         225 ns
graphmath_vec3_dot           9.92 M      100.79 ns ±15170.89%          80 ns         246 ns
graphmath_mat4_transp        9.41 M      106.27 ns ±28274.27%          32 ns         128 ns
graphmath_vec3_cross         6.29 M      158.88 ns ±12414.54%         115 ns         420 ns
graphmath_vm4_mul            4.27 M      234.30 ns  ±1652.00%         194 ns         705 ns
graphmath_mat4_add           4.24 M      235.83 ns  ±2705.20%         197 ns         726 ns
graphmath_vec3_rotate        2.54 M      394.43 ns  ±7408.05%         309 ns        1125 ns
graphmath_mat4_mul           0.80 M     1251.89 ns   ±397.93%        1107 ns        3982 ns

Memory usage statistics:

Name                     Memory usage
graphmath_vec3_add               32 B
graphmath_vec3_dot                0 B
graphmath_mat4_transp           136 B
graphmath_vec3_cross             32 B
graphmath_vm4_mul                32 B
graphmath_mat4_add              136 B
graphmath_vec3_rotate           208 B
graphmath_mat4_mul              136 B

Sooooo yeah, the results are... confusing, and probably very noisy though it seems repeatable enough. My half-assed conclusions so far:

crertel commented 3 months ago

That's wild, thank you.

I think this is enough for me to be comfortable switching to is_float with a fallback for the weird cases otherwise.

icefoxen commented 3 months ago

Yeah that seems like the way to go. The only case where adding typeguards mysteriously makes life slower is Vec3.add/Mat44.add, for some reason. I'd love to understand why, but in practical terms I don't care that much.

crertel commented 2 months ago

@icefoxen just a heads-up...haven't forgotten about your good work here, just been a little swamped (literally--hit by Hurricane Beryl).

crertel commented 1 month ago

@icefoxen sorry to be a bother--can you check if the Mat44 and Mat33 work in #45 help with the performance numbers you noticed?