AcademySoftwareFoundation / OpenShadingLanguage

Advanced shading language for production GI renderers
BSD 3-Clause "New" or "Revised" License
2.09k stars 356 forks source link

Make batched use ustringhash_pod and make batched compile in Windows #1831

Open johnfea opened 3 months ago

johnfea commented 3 months ago

Description

EDIT: Make batched and non-batched internally use the same string representation for reasons discussed below.

Tests

New test cases aren't necessarily required. This PR would fix new failing test case in #1832.

Checklist:

johnfea commented 3 months ago

I also tried to debug a bit why "closure-string" test fails with batched avx2.

oslexec/wide/wide_opclosure.cpp: in void init_closure_component() casting ClosureComponent there into MicrofacetParams printed the same wrong hash for ustringhash as does process_bsdf_closure() in testminimal/oslmaterial.cpp.

oslexec/batched_llvm_gen.cpp and its llvm_gen_closure function: it seems like a promising place to check since it processes all the variables inside microfacet() statement in test.osl, starting with distribution ustringhash. EDIT: This place has the same wrong hash, checked with debug cout: "llvm::outs() << 'loaded: ' << *loaded << '\n';", so the problem happens before that.

johnfea commented 3 months ago

The problem was indeed in llvm_gen_closure function: src/oslexec/batched_llvm_gen.cpp:

"llvm::Value* loaded = rop.llvm_load_value(sym.."

src/liboslexec/batched_backendllvm.cpp:

"llvm::Value*BatchedBackendLLVM::llvm_load_value(const Symbol& sym.."
"ll.constant(string_val);"

src/liboslexec/llvm_util.cpp:

"llvm::Value* LLVM_Util::constant(ustring s)"

This function already has code to select between ustring and ustringhash based on ustring representation, but src/include/OSL/llvm_util.h always selects ustring at least in batched mode.

So I added code in CMakeLists.txt to use already existing OSL_USTRINGREP_IS_HASH to conditionally select hash in src/include/OSL/llvm_util.h and then enabled support for hash for batched functions in src/liboslexec/llvm_util.cpp.

This fixes "closure-string" test but introduces new failed tests because batched functions in liboslexec/wide/* still use ustring_pod instead of ustringhash_pod. I enabled ustringhash_pod for wide/wide_opmatrix.hpp as a start.

johnfea commented 3 months ago

I enabled ustringhash_pod for all files in liboslexec/wide/. However, there's issues with the patch. All string parameters there are now ustringhash_pod and according to this error i64 is being inputted for strings, but its giving type mismatch, not sure why.

Call parameter type does not match function signature!
i64 -3801545496781945523
 i8*  %48 = call i32 @osl_b8_AVX2_build_transform_matrix_Wmss_masked(i8* %45, i8* nonnull %37, i64 -3801545496781945523, i64 -2374949479891991091, i32 %47)

EDIT: It was fixed in src/liboslexec/builtindecl_wide_xmacro.h replacing X with s for string parameters.

johnfea commented 3 months ago

There's now more code to support ustringhash_pod for batched. Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set.

AlexMWells commented 3 months ago

I think we left the batched version alone (not using hashes where possible) mostly to just limit the scope of the code changes with the original hashcode changes which were mostly targeting GPU work happening on the scalar (non-batched) code path. And of course there is some shared code which pays with some confusion on which parameters it's actually aking.

I don't think there is problem with moving batched to ustringhash, the tests are comprehensive and will probably catch everything

AlexMWells commented 3 months ago

There's now more code to support ustringhash_pod for batched. Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set.

You might need to look at each failing call, most likely batched_llvm_gen.cpp.

AlexMWells commented 3 months ago

Just one comment on converting to hashcode vs. ustrings, for "real" string operations that are not constant folded (CPU only), there will be more overhead when we do the lookup of the hashcode to real string value to perform the string op Before:

        ustring str        = wS[lane];
        ustring sep        = wSep[lane];

After:

        ustring str        = ustring_from(wS[lane]);
        ustring sep        = ustring_from(wSep[lane]);

Counter argument was if you had dynamic string operations, good chance that is a much bigger performance penalty than a couple lookups.

sfriedmapixar commented 3 months ago

Regarding moving ustrings to hashes -- the primary reason to do this is to have compiled GPU shaders hit in NVidia's compiler cache, thus it is currently of no benefit to the CPU. Because the non-simd CPU code shares a bunch of declarations for this, they ended up getting changed as well. For the specialized case of CPU SIMD code, in the absence of bugs, we've found in our renderer it makes much more sense to leave it as ustrings and avoid the expensive ustring_from transformations and have strings be easy to inspect in a debugger than to change to the slower and more difficult to use ustringhash implementation. Was this only to try and fix bugs you were seeing with hash values being passed in, or were there other reasons you had?

Also, it would be much better to separate the minimal test tool out into a separate pull request from any major ustring<->ustringhash changes.

johnfea commented 3 months ago

Was this only to try and fix bugs you were seeing with hash values being passed in, or were there other reasons you had?

My first reason was performance. For example, hashing due to calling transform("myspace") was taking around 1% of rendering time in batched mode only. EDIT2: It wasn't unnecessary hashing I was seeing in a profiler but unnecessary hash->ustring conversions, anyway all that I could find in a profiler and some by looking at code are now removed by this patch and were partly due to wide/ code expecting user interface functions to use ustring format but they already use ustringhash.

All of OSL interface code for batched seems to already use hashes. Hashes are great to use in a switch {} from performance standpoint, where the expected cases will be compile time constants. This is much faster than doing string comparisons or map lookups with a string key. This is relevant at least in renderer implementations of get_matrices, get_attributes and when parsing closure tree and inside OSL too for example when comparing noise function names and transform spaces names, some of which could be changed to switches. Hashes might be slower when printing and doing some string operations, but is much of that being run when doing production rendering?

EDIT2: Also, ustringhash_pod can be passed in CPU registers into functions which might give a small performance benefit.

However, the reason I ended up changing it here was to fix this testcase so another reason to consider it is maintainability and code complexity. Having non-batched and batched use different string representation internally can lead to more complex and more error prone code.

Also, it would be much better to separate the minimal test tool out into a separate pull request from any major ustring<->ustringhash changes.

Agreed, I can make those a separate PR. EDIT: Done.

johnfea commented 3 months ago

Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set.

You might need to look at each failing call, most likely batched_llvm_gen.cpp.

Here's an example:

testsuite/array-reg/test_varying_index_string.osl:

string indirectR = rarray[varyingIndexR];
Cout[0] = stoi(indirectR)/256.0;

indirectR gets passed as ustring into stoi() in batched mode. Here's some relevant debug output with OSL_DEV_ONLY and some of my debug prints:

llvm_alloca indirectRNAME: indirectR
string
 n=1 t.size()=8 as VARYING
...
call llvm_gen_generic
generic handle symbol: indirectR
...
lvm_gen_generic osl_b8_AVX2_stoi_WiWs
>>return value is pointer osl_b8_AVX2_stoi_WiWs
....pushing $tmp5 as void_ptr
....pushing indirectR as void_ptr

What I'm looking for here is where in OSL codebase variable indirectR gets set to ustring instead of ustringhash. Some of the places I looked at are:

batched_backendllvm.cpp: llvm_alloca()
batched_llvm_instance.cpp: BatchedBackendLLVM::llvm_assign_initial_value()
johnfea commented 3 months ago

Just one comment on converting to hashcode vs. ustrings, for "real" string operations that are not constant folded (CPU only), there will be more overhead when we do the lookup of the hashcode to real string value to perform the string op

Yes. I already commented this little from performance standpoint above, but additionally: This patch currently uses real ustrings for llvm_gen_printf() variadic arguments just because changing Strutil::vsprintf to support hashes was non-trivial. I'm not sure if this always works, however, unless all of those arguments are always are created in llvm_gen_printf().

But if some mixing can work, then it would be possible to use real ustrings for strings ops or for some string ops on CPU and hashes everywhere else.

johnfea commented 3 months ago

testsuite/array-reg/test_varying_index_string.osl:

I did manage to narrow this one down. The problem is that string array constants are stored as char* pointers or ustrings and then passed as i64 that are actually pointers through everything into stoi().

In non-batched mode, constant allocation happens at llvm_create_constant(), there's even TODO there to use hashes instead. I didn't find where the same thing happens for batched mode, one way to a fix would be to use hashes instead there.

Another way to a possible fix would be to do conversion to hash at BatchedBackendLLVM::llvm_store_value(), non-batched mode already does one conversion like that there. Not sure about what code would achieve that, non-batched mode conversion looked like pointer cast and applying it just seemed to brake the format which was already correct, i64.

Another possibility is to do the conversion to hash at BatchedBackendLLVM::llvm_load_value() which gets called for that const symbol before llvm_store_value(), which also calls llvm_get_pointer() for that symbol.

EDIT: Also, everything I'm using in OSL seems to already work with in batched mode with this patch so I'm no longer in a hurry to get this working. I also tried a smaller patch closurefix that only changes to use ustringhash in batched llvm_gen_closure() llvm::Value* loaded = rop.llvm_load_value() but that didn't work.

johnfea commented 3 months ago

Now this patch also makes batched OSL compile in Windows with MSVC and Mingw and run at least with Mingw, attempting to fix #1645. There's overlap with make batched use ustringhash_pod patch so they are currently a combined patch.

This part is not exactly ready since it disables functionality, but it does run and highlights the problem parts.

It looks like code in liboslexec/wide depends on Linux way of loading symbols, where loaded shared lib can access symbols from loading shared lib, which somehow differs from how Windows handles this. Unless there's some kind of linker options that solves this, I think a solution would be to make oslexec.dll supply wide dlls with callback functions for e.g. error_fmt() and few other functions that are called this way. As is, this patch just comments out those calls from liboslexec/wide.

lgritz commented 4 weeks ago

I'm not sure I understand why this is failing CI. Can you rebase to current main and try pushing again to trigger a CI run?

johnfea commented 4 weeks ago

"Make batched use ustringhash_pod" patch works but is still missing string array constant support, that alone fails CI. I would really ask help from someone who knows the code better to fix that. See discussion above.

"make batched compile in Windows" patch works but disables features which likely fail CI. It probably belongs in a separate patch that is worked on only after "make batched use ustringhash_pod" has been merged.