anderslanglands / alShaders2

BSD 3-Clause "New" or "Revised" License
75 stars 38 forks source link

MAX_STRING_LENGTH more than 255? #15

Closed noizfactory closed 6 years ago

noizfactory commented 6 years ago

Hey guys,

I'm hitting the max string limit for our scenes because we are using full path names for the mtoa scene export. Its a long story about why we are exporting full path names but in the end, we need to use full path names for some workflow reason.

I increased the MAX_STRING_LENGTH in our local copy of the shader from 255 to 2047 and re-ran the render and it worked as expected. Are there any serious downsides to this (besides slightly more memory allocation for strings) since there may be a good reason for limiting it to 255 in the first place.

Thanks, Sachin

jonahfriedman commented 6 years ago

That's used to create buffers on the stack, and shouldn't really affect much else. It was chosen pretty arbitrarily. I'd probably recommend running the tests to be sure.

The only downside is that giant manifests don't work terribly well in the plugins, but this is really more about total manifest size than the length of the longest names.

noizfactory commented 6 years ago

Thanks Jonah. This seems to working fine for us in couple of production shots we tested. Should I submit a PR for this or do you think its better to let the defaults be 255?

jonahfriedman commented 6 years ago

I think we should do performance testing to see if it's a good idea. While we have unit tests to ensure correctness, we don't have any performance testing in place, and I'd like to get that going before making a change like this.

noizfactory commented 6 years ago

I ran some comparative tests between the default crypto shader with 255 string limit and the new one with the 2047 limit and so far I don't see much of a difference in the memory profile of a production scene. I do see more background shader calls though but can that be attributed to this change? I will let you know if I notice any in other scenes.

MAX_STRING_LENGTH = 2047

-----------------------------------------------------------------------------------
| memory consumed in MB:
|  at startup                34.59
|  plugins                   40.95
|  AOV samples            16722.40
|  output buffers           129.00
|  node overhead              9.57
|  instance overhead          2.03
|  message passing            0.05
|  memory pools              38.57
|  geometry               19080.31
|    polymesh               729.03
|    subdivs               7000.92
|    curves               11350.17
|  accel. structs         27491.51
|  skydome importance map   190.76
|  strings                   26.05
|  texture cache           2048.01
|  unaccounted             5977.58
|  total peak             71791.38
| -----------------------------------------------------------------------------------
| ray counts:                      ( /pixel, /sample) (% total) (avg. hits) (max hits)
|  camera                190125000 ( 103.15,    1.00) (  2.76%) (     0.82) (       3)
|  shadow               6077556993 (3297.29,   31.97) ( 88.20%) (     0.59) (       3)
|  diffuse_reflect       475783766 ( 258.13,    2.50) (  6.90%) (     0.48) (       3)
|  diffuse_transmit       31362329 (  17.02,    0.16) (  0.46%) (     0.92) (       2)
|  specular_reflect      102844474 (  55.80,    0.54) (  1.49%) (     0.68) (       3)
|  specular_transmit         24286 (   0.01,    0.00) (  0.00%) (     0.98) (       2)
|  bssrdf                 13318916 (   7.23,    0.07) (  0.19%) (     1.05) (       3)
|  total                6891015764 (3738.62,   36.24) (100.00%) (     0.59) (       3)
| by ray depth:              0     1     2     3     4
|  total                   80.0% 19.5%  0.5%  0.0%  0.0%
| -----------------------------------------------------------------------------------
| shader calls                     (  /pixel,  /sample) (% total)
|  primary             10030545197 ( 5441.92,    52.76) ( 70.11%)
|  transparent_shadow        25065 (    0.01,     0.00) (  0.00%)
|  autobump              562189902 (  305.01,     2.96) (  3.93%)
|  background            651037914 (  353.21,     3.42) (  4.55%)
|  light_filter         3038527487 ( 1648.51,    15.98) ( 21.24%)
|  importance             25000000                      (  0.17%)
|  total               14307325565 ( 7762.22,    75.25) (100.00%)
| by ray depth:              0     1     2     3     4
|  total                   76.1% 23.2%  0.8%  0.0%  0.0%
| sss lookups             30891391 (  16.76,    0.16)
| -----------------------------------------------------------------------------------

MAX_STRING_LENGTH = 255

-----------------------------------------------------------------------------------
| memory consumed in MB:
|  at startup                33.19
|  plugins                   39.31
|  AOV samples            16722.40
|  output buffers           129.00
|  node overhead             10.55
|  instance overhead          2.03
|  message passing            0.05
|  memory pools              38.07
|  geometry               19257.08
|    polymesh               729.04
|    subdivs               7175.85
|    curves               11352.00
|  accel. structs         27496.86
|  skydome importance map   190.76
|  strings                   20.42
|  texture cache           2047.89
|  unaccounted             6002.17
|  total peak             71989.78
| -----------------------------------------------------------------------------------
| ray counts:                      ( /pixel, /sample) (% total) (avg. hits) (max hits)
|  camera                190125000 ( 103.15,    1.00) (  2.76%) (     0.82) (       3)
|  shadow               6077556855 (3297.29,   31.97) ( 88.20%) (     0.59) (       3)
|  diffuse_reflect       475783740 ( 258.13,    2.50) (  6.90%) (     0.48) (       3)
|  diffuse_transmit       31362329 (  17.02,    0.16) (  0.46%) (     0.92) (       2)
|  specular_reflect      102844471 (  55.80,    0.54) (  1.49%) (     0.68) (       3)
|  specular_transmit         24286 (   0.01,    0.00) (  0.00%) (     0.98) (       2)
|  bssrdf                 13318909 (   7.23,    0.07) (  0.19%) (     1.05) (       3)
|  total                6891015590 (3738.62,   36.24) (100.00%) (     0.59) (       3)
| by ray depth:              0     1     2     3     4
|  total                   80.0% 19.5%  0.5%  0.0%  0.0%
| -----------------------------------------------------------------------------------
| shader calls                     (  /pixel,  /sample) (% total)
|  primary             10039943921 ( 5447.02,    52.81) ( 73.28%)
|  transparent_shadow        25065 (    0.01,     0.00) (  0.00%)
|  autobump              562189902 (  305.01,     2.96) (  4.10%)
|  background             34265898 (   18.59,     0.18) (  0.25%)
|  light_filter         3038528001 ( 1648.51,    15.98) ( 22.18%)
|  importance             25000000                      (  0.18%)
|  total               13699952787 ( 7432.70,    72.06) (100.00%)
| by ray depth:              0     1     2     3     4
|  total                   75.0% 24.2%  0.8%  0.0%  0.0%
| sss lookups             30891416 (  16.76,    0.16)
| -----------------------------------------------------------------------------------
jonahfriedman commented 6 years ago

Interesting - the differences there look like noise to me.

I'm a bit more concerned about speed performance than the extra memory. Those buffers are created on the stack, not the heap, which I think should mean they only really last as long as it takes the run the shader. The performance I'm worried about is hurting cache locality by evicting more things from the L1 cache to allocate those buffers. I think it's probably fine, but I'd like to be sure.

What I intend to do when I get to it is create a scenario where the hash is constantly changing from sample to sample, like several objects motion blurred over each other at high AA. (Otherwise the last hash cache kicks in). I'd turn off all indirect sampling and set Cryptomatte to depth=2, to minimize the filtering overhead. If there's no speed difference measurable there, where we've stacked the deck against it, then I think it's fine.

We could also just measure the performance of the name processing and hashing code directly, but I think it might be misleading because that code doesn't really contribute to the total time anyway. It wasn't even measurable in a render last I checked, so I'm more concerned about the effect on the L1 cache.

noizfactory commented 6 years ago

Thanks for your insights Jonah. Sure, I can run a test on the same production scene with motion blur again. It has enough overlapping and thin objects like foliage, grass and fur which should also help with this test. Will post the stats when I have the results.

jonahfriedman commented 6 years ago

This is done in 9ae232c. I could measure no performance difference.