Closed noizfactory closed 6 years ago
That's used to create buffers on the stack, and shouldn't really affect much else. It was chosen pretty arbitrarily. I'd probably recommend running the tests to be sure.
The only downside is that giant manifests don't work terribly well in the plugins, but this is really more about total manifest size than the length of the longest names.
Thanks Jonah. This seems to working fine for us in couple of production shots we tested. Should I submit a PR for this or do you think its better to let the defaults be 255?
I think we should do performance testing to see if it's a good idea. While we have unit tests to ensure correctness, we don't have any performance testing in place, and I'd like to get that going before making a change like this.
I ran some comparative tests between the default crypto shader with 255 string limit and the new one with the 2047 limit and so far I don't see much of a difference in the memory profile of a production scene. I do see more background shader calls though but can that be attributed to this change? I will let you know if I notice any in other scenes.
MAX_STRING_LENGTH = 2047
-----------------------------------------------------------------------------------
| memory consumed in MB:
| at startup 34.59
| plugins 40.95
| AOV samples 16722.40
| output buffers 129.00
| node overhead 9.57
| instance overhead 2.03
| message passing 0.05
| memory pools 38.57
| geometry 19080.31
| polymesh 729.03
| subdivs 7000.92
| curves 11350.17
| accel. structs 27491.51
| skydome importance map 190.76
| strings 26.05
| texture cache 2048.01
| unaccounted 5977.58
| total peak 71791.38
| -----------------------------------------------------------------------------------
| ray counts: ( /pixel, /sample) (% total) (avg. hits) (max hits)
| camera 190125000 ( 103.15, 1.00) ( 2.76%) ( 0.82) ( 3)
| shadow 6077556993 (3297.29, 31.97) ( 88.20%) ( 0.59) ( 3)
| diffuse_reflect 475783766 ( 258.13, 2.50) ( 6.90%) ( 0.48) ( 3)
| diffuse_transmit 31362329 ( 17.02, 0.16) ( 0.46%) ( 0.92) ( 2)
| specular_reflect 102844474 ( 55.80, 0.54) ( 1.49%) ( 0.68) ( 3)
| specular_transmit 24286 ( 0.01, 0.00) ( 0.00%) ( 0.98) ( 2)
| bssrdf 13318916 ( 7.23, 0.07) ( 0.19%) ( 1.05) ( 3)
| total 6891015764 (3738.62, 36.24) (100.00%) ( 0.59) ( 3)
| by ray depth: 0 1 2 3 4
| total 80.0% 19.5% 0.5% 0.0% 0.0%
| -----------------------------------------------------------------------------------
| shader calls ( /pixel, /sample) (% total)
| primary 10030545197 ( 5441.92, 52.76) ( 70.11%)
| transparent_shadow 25065 ( 0.01, 0.00) ( 0.00%)
| autobump 562189902 ( 305.01, 2.96) ( 3.93%)
| background 651037914 ( 353.21, 3.42) ( 4.55%)
| light_filter 3038527487 ( 1648.51, 15.98) ( 21.24%)
| importance 25000000 ( 0.17%)
| total 14307325565 ( 7762.22, 75.25) (100.00%)
| by ray depth: 0 1 2 3 4
| total 76.1% 23.2% 0.8% 0.0% 0.0%
| sss lookups 30891391 ( 16.76, 0.16)
| -----------------------------------------------------------------------------------
MAX_STRING_LENGTH = 255
-----------------------------------------------------------------------------------
| memory consumed in MB:
| at startup 33.19
| plugins 39.31
| AOV samples 16722.40
| output buffers 129.00
| node overhead 10.55
| instance overhead 2.03
| message passing 0.05
| memory pools 38.07
| geometry 19257.08
| polymesh 729.04
| subdivs 7175.85
| curves 11352.00
| accel. structs 27496.86
| skydome importance map 190.76
| strings 20.42
| texture cache 2047.89
| unaccounted 6002.17
| total peak 71989.78
| -----------------------------------------------------------------------------------
| ray counts: ( /pixel, /sample) (% total) (avg. hits) (max hits)
| camera 190125000 ( 103.15, 1.00) ( 2.76%) ( 0.82) ( 3)
| shadow 6077556855 (3297.29, 31.97) ( 88.20%) ( 0.59) ( 3)
| diffuse_reflect 475783740 ( 258.13, 2.50) ( 6.90%) ( 0.48) ( 3)
| diffuse_transmit 31362329 ( 17.02, 0.16) ( 0.46%) ( 0.92) ( 2)
| specular_reflect 102844471 ( 55.80, 0.54) ( 1.49%) ( 0.68) ( 3)
| specular_transmit 24286 ( 0.01, 0.00) ( 0.00%) ( 0.98) ( 2)
| bssrdf 13318909 ( 7.23, 0.07) ( 0.19%) ( 1.05) ( 3)
| total 6891015590 (3738.62, 36.24) (100.00%) ( 0.59) ( 3)
| by ray depth: 0 1 2 3 4
| total 80.0% 19.5% 0.5% 0.0% 0.0%
| -----------------------------------------------------------------------------------
| shader calls ( /pixel, /sample) (% total)
| primary 10039943921 ( 5447.02, 52.81) ( 73.28%)
| transparent_shadow 25065 ( 0.01, 0.00) ( 0.00%)
| autobump 562189902 ( 305.01, 2.96) ( 4.10%)
| background 34265898 ( 18.59, 0.18) ( 0.25%)
| light_filter 3038528001 ( 1648.51, 15.98) ( 22.18%)
| importance 25000000 ( 0.18%)
| total 13699952787 ( 7432.70, 72.06) (100.00%)
| by ray depth: 0 1 2 3 4
| total 75.0% 24.2% 0.8% 0.0% 0.0%
| sss lookups 30891416 ( 16.76, 0.16)
| -----------------------------------------------------------------------------------
Interesting - the differences there look like noise to me.
I'm a bit more concerned about speed performance than the extra memory. Those buffers are created on the stack, not the heap, which I think should mean they only really last as long as it takes the run the shader. The performance I'm worried about is hurting cache locality by evicting more things from the L1 cache to allocate those buffers. I think it's probably fine, but I'd like to be sure.
What I intend to do when I get to it is create a scenario where the hash is constantly changing from sample to sample, like several objects motion blurred over each other at high AA. (Otherwise the last hash cache kicks in). I'd turn off all indirect sampling and set Cryptomatte to depth=2, to minimize the filtering overhead. If there's no speed difference measurable there, where we've stacked the deck against it, then I think it's fine.
We could also just measure the performance of the name processing and hashing code directly, but I think it might be misleading because that code doesn't really contribute to the total time anyway. It wasn't even measurable in a render last I checked, so I'm more concerned about the effect on the L1 cache.
Thanks for your insights Jonah. Sure, I can run a test on the same production scene with motion blur again. It has enough overlapping and thin objects like foliage, grass and fur which should also help with this test. Will post the stats when I have the results.
This is done in 9ae232c. I could measure no performance difference.
Hey guys,
I'm hitting the max string limit for our scenes because we are using full path names for the mtoa scene export. Its a long story about why we are exporting full path names but in the end, we need to use full path names for some workflow reason.
I increased the MAX_STRING_LENGTH in our local copy of the shader from 255 to 2047 and re-ran the render and it worked as expected. Are there any serious downsides to this (besides slightly more memory allocation for strings) since there may be a good reason for limiting it to 255 in the first place.
Thanks, Sachin